Summary from Joseph Paul Cohen
The weights at each layer $W$ are initialized based on the number of connections they have. Each $w \in W$ is drawn from a Gaussian distribution with mean $\mu = 0$ with the variance as follows.
$$\text{Var}(W) = \frac{2}{n_\text{in}+ n_\text{out}}$$
Where $n_\text{in}$ is the number of neurons in the previous layer from the feedforward direction and $n_\text{out}$ is the number of neurons from the previous layer from the backprop direction.
Reference: [Andy Jones's Blog](http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization)

more
less