Taming VAEs on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Taming VAEs
Danilo Jimenez Rezende and Fabio Viola
arXiv e-Print archive - 2018 via Local arXiv
Keywords: stat.ML, cs.LG
more

Summaries/Notes 1

[link] Summary by Gavin Gray 4 years ago

The paper provides derivations and intuitions about the learning dynamics for VAEs based on observations about [$\beta$-VAEs][beta]. Using this they derive an alternative way to constrain the training of VAEs that doesn't require typical heuristics, such as warmup or adding noise to the data.

How exactly would this change a typical implementation? Typically, SGD is used to [optimize the ELBO directly](https://github.com/pytorch/examples/blob/master/vae/main.py#L91-L95). Using GECO, I keep a moving average of my constraint $C$ (chosen based on what I want the VAE to do, but it can be just the likelihood plus a tolerance parameter) and use that to calculate Lagrange multipliers, which control the weighting of the constraint to the loss. [This implementation](https://github.com/denproc/Taming-VAEs/blob/master/train.py#L83-L97) from a class project appears to be correct.

With the stabilization of training, I can't help but think of this as batchnorm for VAEs.

[beta]: https://openreview.net/forum?id=Sy2fzU9gl

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private