Learning Deep Features for Scene Recognition using Places DatabaseLearning Deep Features for Scene Recognition using Places DatabaseZhou, Bolei and Lapedriza, Àgata and Xiao, Jianxiong and Torralba, Antonio and Oliva, Aude2014
Paper summaryabhshkdzThis paper introduces the Places dataset, which is a scene-centric
dataset at the scale of ImageNet (which is for object recognition)
so as to enable training of deep CNNs like AlexNet, and achieves
state-of-the-art for scene benchmarks. Main contributions:
- Collects a dataset at ImageNet scale for scene recognition.
- Achieves state-of-the-art on scene benchmarks: SUN397, MIT Indoor67, Scene15, SUN Attribute.
- Introduces measures for comparing datasets: density and diversity.
- Makes a thorough comparison b/w ImageNet and Places, from dataset to classification results to learned representation visualizations.
- Relative density and diversity are neat ideas for comparing datasets, and are backed by AMT experiments.
- Relative density: The more visually similar a nearest neighbour is to a randomly sampled image from a dataset, the more dense it is.
- Relative diversity: The more visually similar two randomly sampled images from a dataset are, the less diverse it is.
- Demonstrates via activation and mean image visualizations that different representations are learned by CNNs trained on ImageNet and Places
- Conv1 layer visualizations can be directly seen, and are similar for ImageNet-CNN and Places-CNN. They capture low-level information like oriented edges and colors.
- For higher layers, they visualize the average of top 100 images that maximize activations per unit. As we go deeper, ImageNet-CNN units have receptive fields that look more like object-blobs and Places-CNN have RFs that look more like landscapes with spatial structures.
## Weaknesses / Notes
- No explanation as to why the model trained on ImageNet and Places combined (minus overlapping images) performs better than ImageNet-CNN or Places-CNN on some benchmarks and worse on others.