GradNets: Dynamic Interpolation Between Neural Architectures GradNets: Dynamic Interpolation Between Neural Architectures
Paper summary A common setting in deep networks is to design the network first, "freeze" the network architecture, and then train the parameters. The paper pointed out a potential dilemma of that, in the sense that complex networks may have better representation power but may be hard to train. To address this issue the paper proposed to train the network in a hybrid fashion where simpler components and more complex components are combined via a weight average, and the weight is updated over the training procedure to introduce the more complex components, while utilizing the fast training capability of simpler ones. The authors propose to blend any two architectural components as the time of optimisation progresses. As the time progresses, the initial approach, e.g. employed rectifier, is gradually switched off in place of another rectifier. The authors claim that this strategy is good for a fast convergence and they present some experimental results.
GradNets: Dynamic Interpolation Between Neural Architectures
Almeida, Diogo and Sauder, Nate
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

Summary by Open Review 4 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and