In object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss. The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors. The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confidence in the correct class increases. They add a modulating factor as shown in the image below to the cross- entropy loss https://i.imgur.com/N7R3M9J.png Which ends up looking like this https://i.imgur.com/kxC8NCB.png in experiments though they add an additional alpha term to it,because it gives them better results. **Retina Net** The network consists of a single unified network which is composed of a backbone network and two task specific subnetworks.The backbone network computes the feature maps for the input images.The first sub-network helps in object classification of the backbone networks output and the second sub-network helps in bounding box regression. The backbone network they use is Feature Pyramid Network,Which they build on top of ResNet.