Uncertainty-guided Continual Learning with Bayesian Neural NetworksUncertainty-guided Continual Learning with Bayesian Neural NetworksEbrahimi, Sayna and Elhoseiny, Mohamed and Darrell, Trevor and Rohrbach, Marcus2019
Paper summarymcaccia## Introduction
Bayesian Neural Networks (BNN): intrinsic importance model based on weight uncertainty; variational inference can approximate posterior distributions using Monte Carlo sampling for gradient estimation; acts like an ensemble method in that they reduce the prediction variance but only uses 2x the number of parameters.
The idea is to use BNN's uncertainty to guide gradient descent to not update the important weight when learning new tasks.
## Bayes by Backprop (BBB):
https://i.imgur.com/7o4gQMI.png
Where $q(w|\theta)$ is our approximation of the posterior $p(w|x)$. $q$ is most probably gaussian with diagonal covariance. We can optimize this via the ELBO:
https://i.imgur.com/OwGm20b.png
## Uncertainty-guided CL with BNN (UCB):
UCB the regularizing is performed with the learning rate such that the learning rate of each parameter and hence its gradient update becomes a function of its importance. They set the importance to be inversely proportional to the standard deviation $\sigma$ of $q(w|\theta)$
Simply put, the more confident the posterior is about a certain weight, the less is this weight going to be updated.
You can also use the importance for weight pruning (sort of a hard version of the first idea)
## Cartoon
https://i.imgur.com/6Ld79BS.png
## Introduction
Bayesian Neural Networks (BNN): intrinsic importance model based on weight uncertainty; variational inference can approximate posterior distributions using Monte Carlo sampling for gradient estimation; acts like an ensemble method in that they reduce the prediction variance but only uses 2x the number of parameters.
The idea is to use BNN's uncertainty to guide gradient descent to not update the important weight when learning new tasks.
## Bayes by Backprop (BBB):
https://i.imgur.com/7o4gQMI.png
Where $q(w|\theta)$ is our approximation of the posterior $p(w|x)$. $q$ is most probably gaussian with diagonal covariance. We can optimize this via the ELBO:
https://i.imgur.com/OwGm20b.png
## Uncertainty-guided CL with BNN (UCB):
UCB the regularizing is performed with the learning rate such that the learning rate of each parameter and hence its gradient update becomes a function of its importance. They set the importance to be inversely proportional to the standard deviation $\sigma$ of $q(w|\theta)$
Simply put, the more confident the posterior is about a certain weight, the less is this weight going to be updated.
You can also use the importance for weight pruning (sort of a hard version of the first idea)
## Cartoon
https://i.imgur.com/6Ld79BS.png