The paper presents a new deep learning framework for person search. The authors propose to unify two disjoint tasks of 'person detection' and 'person re-identification' into a single problem of 'person search' using a Convolutional Neural Network (CNN). Also, a new large-scale benchmark dataset for person search is collected and annotated. It contains $18,184$ images, $8,432$ identities, and $96,143$ pedestrian bounding boxes. Conventional person re-identification approaches detect people first and then extract features for each person, finally classifying to a category (as depicted in the figure below). Instead of breaking this into separate processes of detection and classification, the problem is solved jointly by using a single CNN similar to the Faster-RCNN framework. https://i.imgur.com/ISDQd9L.png The proposed framework (shown in the figure below) has a pedestrian proposal net which is used to detect people, and an identification net for extracting features for comparing with the target person. The two modules adapt with each other through joint optimization. In addition, a loss function called Online Instance Matching (OIM) is introduced to cope with problems of using Softmax or pairwise/triplet distance loss functions when the number of identities is large. A lookup table of features from all the labeled identities is maintained. In addition, the approach takes into account many unlabeled identities likely to appear in scene images, as negatives for labeled identities. There are no parameters to learn, the lookup table (LUT) and circular queue (CQ) are just feature buffers. When forward, each labeled identity is matched with all the stored features. When backward, the LUT is updated according to the ID, pushing new features to CQ, and pop out-of-date ones. https://i.imgur.com/1Smsi56.png To validate the approach, a new person search dataset is collected. On this dataset, the training accuracy when using Softmax loss is around $15\%$. However, with the OIM loss the accuracy improves consistently. Experiments are also performed to compare the method with baseline approaches. The baseline result is around $74\%$, while the proposed approach result (without unlabeled) is $76.1\%$ and $78.7\%$ with unlabeled data.