Overcoming catastrophic forgetting in neural networks
Overcoming catastrophic forgetting in neural networks
Kirkpatrick, James and Pascanu, Razvan and Rabinowitz, Neil and Veness, Joel and Desjardins, Guillaume and Rusu, Andrei A. and Milan, Kieran and Quan, John and Ramalho, Tiago and Grabska-Barwinska, Agnieszka and Hassabis, Demis and Clopath, Claudia and Kumaran, Dharshan and Hadsell, Raia
2016

Paper summary
luyuchen
This paper proposes a simple method for sequentially training new tasks and avoid catastrophic forgetting. The paper starts with the Bayesian formulation of learning a model that is
$$
\log P(\theta | D) = \log P(D | \theta) + \log P(\theta) - \log P(D)
$$
By switching the prior into the posterior of previous task(s), we have
$$
\log P(\theta | D) = \log P(D | \theta) + \log P(\theta | D_{prev}) - \log P(D)
$$
The paper use the following form for posterior
$$
P(\theta | D_{prev}) = N(\theta_{prev}, diag(F))
$$
where $F$ is the Fisher Information matrix $E_x[ \nabla_\theta \log P(x|\theta) (\nabla_\theta \log P(x|\theta))^T]$. Then the resulting objective function is
$$
L(\theta) = L_{new}(\theta) + \frac{\lambda}{2}\sum F_{ii} (\theta_i - \theta^{prev*}_i)^2
$$
where $L_{new}$ is the loss on new task, and $\theta^{prev*}$ is previous best parameter. It can be viewed as a distance which uses Fisher Informatrix to properly scale each dimension, and it further proves that the Fisher Information matrix is important in the experienment by comparing with simple $L_2$ distance.
Overcoming catastrophic forgetting in neural networks

Kirkpatrick, James and Pascanu, Razvan and Rabinowitz, Neil and Veness, Joel and Desjardins, Guillaume and Rusu, Andrei A. and Milan, Kieran and Quan, John and Ramalho, Tiago and Grabska-Barwinska, Agnieszka and Hassabis, Demis and Clopath, Claudia and Kumaran, Dharshan and Hadsell, Raia

- 2016 via Local Bibsonomy

Keywords: deep-learning

Kirkpatrick, James and Pascanu, Razvan and Rabinowitz, Neil and Veness, Joel and Desjardins, Guillaume and Rusu, Andrei A. and Milan, Kieran and Quan, John and Ramalho, Tiago and Grabska-Barwinska, Agnieszka and Hassabis, Demis and Clopath, Claudia and Kumaran, Dharshan and Hadsell, Raia

- 2016 via Local Bibsonomy

Keywords: deep-learning

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

About