Early Inference in Energy-Based Models Approximates Back-PropagationEarly Inference in Energy-Based Models Approximates Back-PropagationYoshua Bengio and Asja Fischer2015
Paper summarypetered# Very Short
The authors define a neural network as a nonlinear dynamical system whose fixed points correspond to the minima of some **energy function**. They then show that if one were to start at a fixed-point and *perturb* the output units in the direction that minimizes a loss, the initial perturbation that would flow back through the network would be proportional to the gradient of the neural activations with respect to this loss. Thus, the initial propagation of those propagations (i.e. **early inference**) **approximates** the **backpropagated** gradients of the loss.
First published: 2015/10/09 (3 years ago) Abstract: We show that Langevin MCMC inference in an energy-based model with latent
variables has the property that the early steps of inference, starting from a
stationary point, correspond to propagating error gradients into internal
layers, similarly to back-propagation. The error that is back-propagated is
with respect to visible units that have received an outside driving force
pushing them away from the stationary point. Back-propagated error gradients
correspond to temporal derivatives of the activation of hidden units. This
observation could be an element of a theory for explaining how brains perform
credit assignment in deep hierarchies as efficiently as back-propagation does.
In this theory, the continuous-valued latent variables correspond to averaged
voltage potential (across time, spikes, and possibly neurons in the same
minicolumn), and neural computation corresponds to approximate inference and
error back-propagation at the same time.
# Very Short
The authors define a neural network as a nonlinear dynamical system whose fixed points correspond to the minima of some **energy function**. They then show that if one were to start at a fixed-point and *perturb* the output units in the direction that minimizes a loss, the initial perturbation that would flow back through the network would be proportional to the gradient of the neural activations with respect to this loss. Thus, the initial propagation of those propagations (i.e. **early inference**) **approximates** the **backpropagated** gradients of the loss.