Optimal Brain Damage Optimal Brain Damage
Paper summary Optimal Brain Damage (OBD) is a techique to make a network smaller by pruning small weights. ## Idea * use second-derivative information to make tradeoff between network complexity and training error * do this while training to prevent overfitting / reduce the need for data / reduce training time * **How to choose what to delete**: Weights which have least impact on training error. This is estimated by approximating the function with a Taylor series. ## Recipe (Directly copied from the paper): The OBD procedure can be carried out as follows: 1. Choose a reasonable network architecture 2. Train the network until a reasonable solution is obtained 3. Compute the second derivatives $h_{kk}$ for each parameter 4. Compute the saliencies for each parameter: $s_k = h_{kk} u_k^2 /2$ 5. Sort the parameters by saliency and delete some low-saliency parameters 6. Iterate to step 2 Deleting a parameter is defined as setting it to 0 and freezing it there. Several variants of the procedure can be devised, such as decreasing the values of the low-saliency parameters instead of simply setting them to 0, or allowing the deleted parameters to adapt again after they have been set to 0. ## See also * 1989: Optimal Brain Damage ([original pdf](https://papers.nips.cc/paper/250-optimal-brain-damage.pdf), [nice pdf](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf), [txt](https://github.com/NicolasEstrada/nlp/blob/master/nipstxt/nips02/0598.txt)) * 1993: [Optimal Brain Surgeon](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiS92) ([pdf](https://papers.nips.cc/paper/647-second-order-derivatives-for-network-pruning-optimal-brain-surgeon.pdf) and [follow-up](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiSW93), [2](http://www.shortscience.org/paper?bibtexKey=conf/epia/EndischHS07)) * 1998: LeNet-5 * 2012: AlexNet * 2015: [Learning both Weights and Connections for Efficient Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1506.02626) * 2016: [Neural networks with differentiable structure](http://www.shortscience.org/paper?bibtexKey=journals%2Fcorr%2F1606.06216#martinthoma)

Summary by Martin Thoma 4 years ago
Your comment:

ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and