Deep Networks with Stochastic Depth Deep Networks with Stochastic Depth
Paper summary TLDR; The authors randomly drop out complete layers during training using a modified ResNet architecture. The dropout probability hyperparameter decreases linearly (higher layers have a higher chance to be dropped) ending at 0.5 at the final layer in the experiments. This mechanisms helps vanishing gradients, diminishing feature reuse, and long training time. The model achieves new records on the CIFAR-10, CIFAR-100 and SVHN dataset. #### Key Points: - Can easily modify ResNet architecture to dropout out whole layer by only keeping the identity skip connection - Lower layers get lower probability of being dropped since they intuitively contain more "stable" features. Authors use linear decay with final value 0.5. - Training time reduces by 25% - 50% depending on dropout probability hyperparameter - Authors find that vanishing gradients are indeed reduces by plotting the gradient magnitudes vs. number of epochs - Can be interpreted as an ensemble of networks with varying depth - All layers are used during test time and need to scale activations appropriately - Authors successfully train network with 1000+ layers and achieve further error reduction
Deep Networks with Stochastic Depth
Huang, Gao and Sun, Yu and Liu, Zhuang and Sedra, Daniel and Weinberger, Kilian
- 2016 via Bibsonomy
Keywords: deeplearning, acreuser

Your comment:
Your comment: allows researchers to publish paper summaries that are voted on and ranked!