All you need is a good init All you need is a good init
Paper summary Mean(input) = 0, var(input) =1 is good for learning. Independent input features are good for learning. So: 1) Pre-Initialize network weights with (approximate) orthonormal matrices 2) Do forward pass with mini-batch 3) Divide layer weights by $\sqrt{var(Output)}$ 4) PROFIT!
All you need is a good init
Mishkin, Dmytro and Matas, Jiri
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

Summary by Dmytro Mishkin 4 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and