[link]
Optimal Brain Damage (OBD) is a techique to make a network smaller by pruning small weights. ## Idea * use secondderivative information to make tradeoff between network complexity and training error * do this while training to prevent overfitting / reduce the need for data / reduce training time * **How to choose what to delete**: Weights which have least impact on training error. This is estimated by approximating the function with a Taylor series. ## Recipe (Directly copied from the paper): The OBD procedure can be carried out as follows: 1. Choose a reasonable network architecture 2. Train the network until a reasonable solution is obtained 3. Compute the second derivatives $h_{kk}$ for each parameter 4. Compute the saliencies for each parameter: $s_k = h_{kk} u_k^2 /2$ 5. Sort the parameters by saliency and delete some lowsaliency parameters 6. Iterate to step 2 Deleting a parameter is defined as setting it to 0 and freezing it there. Several variants of the procedure can be devised, such as decreasing the values of the lowsaliency parameters instead of simply setting them to 0, or allowing the deleted parameters to adapt again after they have been set to 0. ## See also * 1989: Optimal Brain Damage ([original pdf](https://papers.nips.cc/paper/250optimalbraindamage.pdf), [nice pdf](http://yann.lecun.com/exdb/publis/pdf/lecun90b.pdf), [txt](https://github.com/NicolasEstrada/nlp/blob/master/nipstxt/nips02/0598.txt)) * 1993: [Optimal Brain Surgeon](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiS92) ([pdf](https://papers.nips.cc/paper/647secondorderderivativesfornetworkpruningoptimalbrainsurgeon.pdf) and [followup](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiSW93), [2](http://www.shortscience.org/paper?bibtexKey=conf/epia/EndischHS07)) * 1998: LeNet5 * 2012: AlexNet * 2015: [Learning both Weights and Connections for Efficient Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1506.02626) * 2016: [Neural networks with differentiable structure](http://www.shortscience.org/paper?bibtexKey=journals%2Fcorr%2F1606.06216#martinthoma)
Your comment:
