Residual Networks of Residual Networks: Multilevel Residual Networks
Ke Zhang
and
Miao Sun
and
Tony X. Han
and
Xingfang Yuan
and
Liru Guo
and
Tao Liu
arXiv e-Print archive - 2016 via arXiv
Keywords:
cs.CV
First published: 2016/08/09 (7 years ago) Abstract: A residual-networks family with hundreds or even thousands of layers
dominates major image recognition tasks, but building a network by simply
stacking residual blocks inevitably limits its optimization ability. This paper
proposes a novel residual-network architecture, Residual networks of Residual
networks (RoR), to dig the optimization ability of residual networks. RoR
substitutes optimizing residual mapping of residual mapping for optimizing
original residual mapping. In particular, RoR adds level-wise shortcut
connections upon original residual networks to promote the learning capability
of residual networks. More importantly, RoR can be applied to various kinds of
residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their
performance. Our experiments demonstrate the effectiveness and versatility of
RoR, where it achieves the best performance in all residual-network-like
structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on
CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%,
respectively. RoR-3 models also achieve state-of-the-art results compared to
ResNets on ImageNet data set.
This paper introduces a modification to the ResNets architecture with multi-level shortcut connections (shortcut from input to pre-final layer as level 1, shortcut over each residual block group as level 2, etc) as opposed to single-level shortcut connections in prior work on ResNets. The authors perform experiments with multi-level shortcut connections on regular ResNets, ResNets with pre-activations and Wide ResNets. Combined with drop-path regularization via stochastic depth and exploration over optimal shortcut level number and optimal depth/width ratio to avoid vanishing gradients and overfitting, this architecture achieves state-of-the-art error rates on CIFAR-10 (3.77%), CIFAR-100 (19.73%) and SVHN (1.59%).
## Strengths
- Fairly exhaustive set of experiments over
- Shortcut level numbers.
- Identity mapping types: 1) zero-padding shortcuts, 2) 1x1 convolutions for projections and others identity, and 3) all 1x1 convolutions.
- Residual block size (2 or 3 3x3 convolutional layers).
- Depths (110, 164, 182, 218) and widths for both ResNets and Pre-ResNets.