Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency MapsDeep Inside Convolutional Networks: Visualising Image Classification Models and Saliency MapsSimonyan, Karen and Vedaldi, Andrea and Zisserman, Andrew2013
Paper summaryabhshkdzThis paper attempts to understand the representations learnt by deep
convolutional neural networks by introducing two interpretable visualization
techniques. Main contributions:
- Class model visualizations
- These are obtained by making numerical optimizations in the input
space to maximize the class score. Gradients are calculated wrt input
and are used to update the input image (initialized with zero image),
while weights are kept fixed to those obtained from training.
- Image-specific saliency map visualizations
- These are approximated by using the same gradient as before (gradient
of class score wrt input). The absolute pixel-wise max across channels produces
the saliency map.
- Relation between DeconvNet and optimization-based visualizations
- Visualizations using DeconvNet are the same as gradient-based methods except
for ReLU. In regular backprop, gradients flow through ReLU to units with positive
input activations, whereas in case of a DeconvNet, it is computed on positive output
- The visualization techniques are simple ideas and the results are interpretable. They show
that the method proposed by Erhan et al. in an unsupervised setting is useful to CNNs trained
in a supervised manner as well.
- The image-specific class saliency can be interpreted as those pixels which need to be changed
the least to have a maximum impact on the classification score.
- The relation between DeconvNet visualizations and optimization-based visualizations is
## Weaknesses / Notes
- The thinking behind initializing with zero image and L2 regularization in class model
visualizations was missing.
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
arXiv e-Print archive - 2013 via Local Bibsonomy
This paper presents methods for visualizing the behaviour of an object recognition convolutional neural network. The first method generates a "canonical image" for a given class that the network can recognize. The second generates a saliency map for a given input image and specified class, that illustrates the part of the image (pixels) that influence the most the given class's output probability. This can be used to seed a graphcut segmentation and localize objects of that class in the input image. Finally, a connection between the saliency map method and the work of Zeiler and Fergus on using deconvolutions to visualize deep networks is established.