**Contributions:** - Triplet ranking loss, implemented in three-stream Siamese network - Integrate region proposal network in system. All operations are derivative, making the system end-to-end trainable. - Proposed dataset cleaning method, which is critical for performance boost. - Performance surpasses previous global descriptors and most of local based descriptors in Landmarks dataset. **Training:** - Sample triplets, triplet hinge loss: $L(I_q, I^+, I^-)=max(0, m+q^Td^- - q^Td^+)$ - Since only convolutional layers are used in CNN, and aggregation does not require a fixed input size, full image resolution could be used. **Network data flow:** - Use convolutional layers of pre-trained network to extract activation features. - Max-pooling in different regions, using multi-scale rigid grid with overlapping cells. Note that ROI pooling is differentiable. - L2 normalize region features, whiten with PCA and l2-normalize again. PCA projection can be implemented with a shifting and a FC layer. - Aggregate: sum and l2 normalize. - Dot product similarity of image vector is approximately many-to-many region matching. **Region Proposal Network** - Objective function is multi-task loss, which combines classification loss and regression loss. - When applied, need to perform non-maximum suppression, keep top K proposals for each image. **Landmark Dataset Cleaning** - Construct image graph, with edges as similarity score. The score is computed offline, using invariant keypoint matching and spatial verification. - Extract connected components in graph. They correspond to differnt profiles of a landmark.