Generative Adversarial NetsGenerative Adversarial NetsGoodfellow, Ian J. and Pouget-Abadie, Jean and Mirza, Mehdi and Xu, Bing and Warde-Farley, David and Ozair, Sherjil and Courville, Aaron C. and Bengio, Yoshua2014
Paper summarycubs#### Problem addressed:
learning data distribution using non-parametric models
The authors propose learning generative model as playing an adversarial game, where one player is the generative model and the other is a discriminative model. The discriminative model learns to differentiate samples obtained from the generative model with true samples from the dataset, and the generative model tries to improve its data likelihood. They provide theoretical result that when one use such adversarial objective the global optimum of generative model will be the data distribution. Good generative performance is presented both quantitatively and qualitatively.
Treating learning generative model as playing a two player adversarial game, although such idea is kind of there in the training of energy based models, this paper is the first to successfully demonstrate this power.
As with any of the non-parametric generative models, the evaluation of data probability is not very easy.
MNIST, TFD, CIFAR-10 (only qualitative)
#### Additional remarks:
Presentation video available on cedar server
# Generative Adversarial Nets
* The paper proposes an adversarial approach for estimating generative models where one model (generative model) tries to learn a data distribution and another model (discriminative model) tries to distinguish between samples from the generative model and original data distribution.
* [Link to the paper](https://arxiv.org/abs/1406.2661)
## Adversarial Net
* Two models - Generative Model(*G*) and Discriminative Model(*D*)
* Both are multi-layer perceptrons.
* *G* takes as input a noise variable *z* and outputs data sample *x(=G(z))*.
* *D* takes as input a data sample *x* and predicts whether it came from true data or from *G*.
* *G* tries to minimise *log(1-D(G(z)))* while *D* tries to maximise the probability of correct classification.
* Think of it as a minimax game between 2 players and the global optimum would be when *G* generates perfect samples and *D* can not distinguish between the samples (thereby always returning 0.5 as the probability of sample coming from true data).
* Alternate between *k* steps of training *D* and 1 step of training *G* so that *D* is maintained near its optimal solution.
* When starting training, the loss *log(1-D(G(z)))* would saturate as *G* would be weak. Instead maximise *log(D(G(z)))*
* The paper contains the theoretical proof for global optimum of the minimax game.
* MNIST, Toronto Face Database, CIFAR-10
* Generator model uses RELU and sigmoid activations.
* Discriminator model uses maxout and dropout.
* Evaluation Metric
* Fit Gaussian Parzen window to samples obtained from *G* and compare log-likelihood.
* Computational advantages
* Backprop is sufficient for training with no need for Markov chains or performing inference.
* A variety of functions can be used in the model.
* Since *G* is trained only using the gradients from *D*, fewer chances of directly copying features from the true data.
* Can represent sharp (even degenerate) distributions.
* *D* must be well synchronised with *G*.
* While *G* may learn to sample data points that are indistinguishable from true data, no explicit representation can be obtained.
## Possible Extensions
* Conditional generative models.
* Inference network to predict *z* given *x*.
* Implement a stochastic extension of the deterministic [Multi-Prediction Deep Boltzmann Machines](https://papers.nips.cc/paper/5024-multi-prediction-deep-boltzmann-machines.pdf)
* Using discriminator net or inference net for feature selection.
* Accelerating training by ensuring better coordination between *G* and *D* or by determining better distributions to sample *z* from during training.