- Fast: simply a linear projection from image feature to semantic tag: learn linear projection **W**. Especially, in testing time, it is almost O(1) complexity compared with the nearest neighbor methods with at least O(N) complexity.
- Enrich incomplete tags: learn tag enrichment projection **B** that turns on likely co-occurring tags with existing ones.
Marginalized blank-out regularization
- Assume observed tags are corrupted, approximate the unknown corrupting distribution with piecewise uniform distribution.
- Stacking: multi-layer linear projection. Reconstruct tags that do not co-occur together but tend to appear within similar contexts.
- Rare tags and Non-Uniform Corruption: only optimize tags that have recall below some threshold in the validation set.
- Triplet ranking loss, implemented in three-stream Siamese network
- Integrate region proposal network in system. All operations are derivative, making the system end-to-end trainable.
- Proposed dataset cleaning method, which is critical for performance boost.
- Performance surpasses previous global descriptors and most of local based descriptors in Landmarks dataset.
- Sample triplets, triplet hinge loss:
$L(I_q, I^+, I^-)=max(0, m+q^Td^- - q^Td^+)$
- Since only convolutional layers are used in CNN, and aggregation does not require a fixed input size, full image resolution could be used.
**Network data flow:**
- Use convolutional layers of pre-trained network to extract activation features.
- Max-pooling in different regions, using multi-scale rigid grid with overlapping cells. Note that ROI pooling is differentiable.
- L2 normalize region features, whiten with PCA and l2-normalize again. PCA projection can be implemented with a shifting and a FC layer.
- Aggregate: sum and l2 normalize.
- Dot product similarity of image vector is approximately many-to-many region matching.
**Region Proposal Network**
- Objective function is multi-task loss, which combines classification loss and regression loss.
- When applied, need to perform non-maximum suppression, keep top K proposals for each image.
**Landmark Dataset Cleaning**
- Construct image graph, with edges as similarity score. The score is computed offline, using invariant keypoint matching and spatial verification.
- Extract connected components in graph. They correspond to differnt profiles of a landmark.