Deep Information PropagationDeep Information PropagationSchoenholz, Samuel S. and Gilmer, Justin and Ganguli, Surya and Sohl-Dickstein, Jascha2016

Paper summaryleopaillier_Objective:_ Fondamental analysis of random networks using mean-field theory. Introduces two scales controlling network behavior.
## Results:
Guide to choose hyper-parameters for random networks to be nearly critical (in between order and chaos). This in turn implies that information can propagate forward and backward and thus the network is trainable (not vanishing or exploding gradient).
Basically for any given number of layers and initialization covariances for weights and biases, tells you if the network will be trainable or not, kind of an architecture validation tool.
**To be noted:** any amount of dropout removes the critical point and therefore imply an upper bound on trainable network depth.
## Caveats:
* Consider only bounded activation units: no relu, etc.
* Applies directly only to fully connected feed-forward networks: no convnet, etc.

_Objective:_ Fondamental analysis of random networks using mean-field theory. Introduces two scales controlling network behavior.
## Results:
Guide to choose hyper-parameters for random networks to be nearly critical (in between order and chaos). This in turn implies that information can propagate forward and backward and thus the network is trainable (not vanishing or exploding gradient).
Basically for any given number of layers and initialization covariances for weights and biases, tells you if the network will be trainable or not, kind of an architecture validation tool.
**To be noted:** any amount of dropout removes the critical point and therefore imply an upper bound on trainable network depth.
## Caveats:
* Consider only bounded activation units: no relu, etc.
* Applies directly only to fully connected feed-forward networks: no convnet, etc.