[link]
A paper in the intersection for Computer Vision and Machine Learning. They simplify networks by replacing max-pooling by convolutions with higher stride. * introduce a new variant of the "deconvolution approach" for visualizing features learned by CNNs, which can be applied to a broader range of network structures than existing approaches ## Datasets competitive or state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100, ImageNet) ![]()
Your comment:
|
[link]
This paper simplifies the convolutional network proposed by Alex Krizhevsky by replacing max-pooling with strided convolutions (under the assumption that max-pooling is required only for dimensionality reduction). They also propose a novel technique for visualizing representations learnt by intermediate layers that produces nicer visualizations in input pixel space than DeconvNet (Zeiler et al) and Saliency map (Simonyan at al) approaches. ## Strengths - Their model performs at par or better than the original AlexNet formulation. - Max-pooling replaced by convolution with stride 2 - Fully-connected layers replaced by 1x1 convolutions and global averaging + softmax - Smaller filter size (same intuition as VGGNet paper) - Combining the DeconvNet (Zeiler et al.) and backpropagation (Simonyan et al.) approaches at the ReLU operator (which is the only point of difference) by masking out values where at least one of input activation or output reconstruction is negative (guided backprop) is neat and leads to nice visualizations. ## Weaknesses / Notes - Saliency maps generated from guided backpropagation definitely look much better as compared to DeconvNet visualizations and saliency maps from Simonyan et al's paper. It works better probably because the negative saliency values only arise from the very first convolution, since negative error signals are never propagated back through the non-linearities. ![]() |