[link]
Disclaimer: I am an author # Intro Experience replay (ER) and generative replay (GEN) are two effective continual learning strategies. In the former, samples from a stored memory are replayed to the continual learner to reduce forgetting. In the latter, old data is compressed with a generative model and generated data is replayed to the continual learner. Both of these strategies assume a random sampling of the memories. But learning a new task doesn't cause **equal** interference (forgetting) on the previous tasks! In this work, we propose a controlled sampling of the replays. Specifically, we retrieve the samples which are most interfered, i.e. whose prediction will be most negatively impacted by the foreseen parameters update. The method is called Maximally Interfered Retrieval (MIR). ## Cartoon for explanation https://i.imgur.com/5F3jT36.png Learning about dogs and horses might cause more interference on lions and zebras than on cars and oranges. Thus, replaying lions and zebras would be a more efficient strategy. # Method 1) incoming data: $(X_t,Y_t)$ 2) foreseen parameter update: $\theta^v= \theta\alpha\nabla\mathcal{L}(f_\theta(X_t),Y_t)$ ### applied to ER (ERMIR) 3) Search for the top$k$ values $x$ in the stored memories using the criterion $$s_{MI}(x) = \mathcal{L}(f_{\theta^v}(x),y) \mathcal{L}(f_{\theta}(x),y)$$ ### or applied to GEN (GENMIR) 3) $$ \underset{Z}{\max} \, \mathcal{L}\big(f_{\theta^v}(g_\gamma(Z)),Y^*\big) \mathcal{L}\big(f_{\theta}(g_\gamma(Z)),Y^*\big) $$ $$ \text{s.t.} \quad z_iz_j_2^2 > \epsilon \forall z_i,z_j \in Z \,\text{with} \, z_i\neq z_j $$ i.e. search in the latent space of a generative model $g_\gamma$ for samples that are the most forgotten given the foreseen update. 4) Then add theses memories to incoming data $X_t$ and train $f_\theta$ # Results ### qualitative https://i.imgur.com/ZRNTWXe.png Whilst learning 8s and 9s (first row), GENMIR mainly retrieves 3s and 4s (bottom two rows) which are similar to 8s and 9s respectively. ### quantitative GENMIR was tested on MNIST SPLIT and Permuted MNIST, outperforming the baselines in both cases. ERMIR was tested on MNIST SPLIT, Permuted MNIST and Split CIFAR10, outperforming the baselines in all cases. # Other stuff ### (for avid readers) We propose a hybrid method (AEMIR) in which the generative model is replaced with an autoencoder to facilitate the compression of harder dataset like e.g. CIFAR10.
Your comment:
