Optimal Brain DamageOptimal Brain DamageLeCun, Yann and Denker, John S. and Solla, Sara A.1989

Paper summarymartinthomaOptimal Brain Damage (OBD) is a techique to make a network smaller by pruning small weights.
## Idea
* use second-derivative information to make tradeoff between network complexity and training error
* do this while training to prevent overfitting / reduce the need for data / reduce training time
* **How to choose what to delete**: Weights which have least impact on training error. This is estimated by approximating the function with a Taylor series.
## Recipe
(Directly copied from the paper):
The OBD procedure can be carried out as follows:
1. Choose a reasonable network architecture
2. Train the network until a reasonable solution is obtained
3. Compute the second derivatives $h_{kk}$ for each parameter
4. Compute the saliencies for each parameter: $s_k = h_{kk} u_k^2 /2$
5. Sort the parameters by saliency and delete some low-saliency parameters
6. Iterate to step 2
Deleting a parameter is defined as setting it to 0 and freezing it there. Several
variants of the procedure can be devised, such as decreasing the values of the low-saliency parameters instead of simply setting them to 0, or allowing the deleted
parameters to adapt again after they have been set to 0.
## See also
* 1989: Optimal Brain Damage ([original pdf](https://papers.nips.cc/paper/250-optimal-brain-damage.pdf), [nice pdf](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf), [txt](https://github.com/NicolasEstrada/nlp/blob/master/nipstxt/nips02/0598.txt))
* 1993: [Optimal Brain Surgeon](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiS92) ([pdf](https://papers.nips.cc/paper/647-second-order-derivatives-for-network-pruning-optimal-brain-surgeon.pdf) and [follow-up](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiSW93), [2](http://www.shortscience.org/paper?bibtexKey=conf/epia/EndischHS07))
* 1998: LeNet-5
* 2012: AlexNet
* 2015: [Learning both Weights and Connections for Efficient Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1506.02626)
* 2016: [Neural networks with differentiable structure](http://www.shortscience.org/paper?bibtexKey=journals%2Fcorr%2F1606.06216#martinthoma)

Optimal Brain Damage (OBD) is a techique to make a network smaller by pruning small weights.
## Idea
* use second-derivative information to make tradeoff between network complexity and training error
* do this while training to prevent overfitting / reduce the need for data / reduce training time
* **How to choose what to delete**: Weights which have least impact on training error. This is estimated by approximating the function with a Taylor series.
## Recipe
(Directly copied from the paper):
The OBD procedure can be carried out as follows:
1. Choose a reasonable network architecture
2. Train the network until a reasonable solution is obtained
3. Compute the second derivatives $h_{kk}$ for each parameter
4. Compute the saliencies for each parameter: $s_k = h_{kk} u_k^2 /2$
5. Sort the parameters by saliency and delete some low-saliency parameters
6. Iterate to step 2
Deleting a parameter is defined as setting it to 0 and freezing it there. Several
variants of the procedure can be devised, such as decreasing the values of the low-saliency parameters instead of simply setting them to 0, or allowing the deleted
parameters to adapt again after they have been set to 0.
## See also
* 1989: Optimal Brain Damage ([original pdf](https://papers.nips.cc/paper/250-optimal-brain-damage.pdf), [nice pdf](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf), [txt](https://github.com/NicolasEstrada/nlp/blob/master/nipstxt/nips02/0598.txt))
* 1993: [Optimal Brain Surgeon](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiS92) ([pdf](https://papers.nips.cc/paper/647-second-order-derivatives-for-network-pruning-optimal-brain-surgeon.pdf) and [follow-up](http://www.shortscience.org/paper?bibtexKey=conf/nips/HassibiSW93), [2](http://www.shortscience.org/paper?bibtexKey=conf/epia/EndischHS07))
* 1998: LeNet-5
* 2012: AlexNet
* 2015: [Learning both Weights and Connections for Efficient Neural Networks](http://www.shortscience.org/paper?bibtexKey=journals/corr/1506.02626)
* 2016: [Neural networks with differentiable structure](http://www.shortscience.org/paper?bibtexKey=journals%2Fcorr%2F1606.06216#martinthoma)