Conditional Generative Adversarial NetsConditional Generative Adversarial NetsMirza, Mehdi and Osindero, Simon2014
Paper summaryleopaillier_Objective:_ In an unconditional GAN it's not possible to control the mode of the data being generated which is what this paper tries to accomplish using the label data (but it can be generalized to any kind of conditional data).
_Dataset:_ [MNIST](yann.lecun.com/exdb/mnist/) and [MIRFLICKR](http://press.liacs.nl/mirflickr/).
#### Inner workings:
Changes the loss to the conditional loss:
[![screen shot 2017-04-24 at 10 07 25 am](https://cloud.githubusercontent.com/assets/17261080/25327832/e86f53fe-28d5-11e7-8694-6df8f2e1ef18.png)](https://cloud.githubusercontent.com/assets/17261080/25327832/e86f53fe-28d5-11e7-8694-6df8f2e1ef18.png)
For implementation the only thing needed is to feed the label data to both the discriminator and generator:
[![screen shot 2017-04-24 at 10 07 18 am](https://cloud.githubusercontent.com/assets/17261080/25327826/e53ab4a8-28d5-11e7-8056-1518602d50c9.png)](https://cloud.githubusercontent.com/assets/17261080/25327826/e53ab4a8-28d5-11e7-8056-1518602d50c9.png)
Interesting at the time but not surprising now. There's not much more to the paper than what is in the summary.
# Conditional Generative Adversarial Nets
* Conditional version of [Generative Adversarial Nets (GAN)](https://gist.github.com/shagunsodhani/1f9dc0444142be8bd8a7404a226880eb) where both generator and discriminator are conditioned on some data **y** (class label or data from some other modality).
* [Link to the paper](https://arxiv.org/abs/1411.1784)
* Feed **y** into both the generator and discriminator as additional input layers such that **y** and input are combined in a joint hidden representation.
### Unimodal Setting
* Conditioning MNIST images on class labels.
* *z* (random noise) and **y** mapped to hidden layers with ReLu with layer sizes of 200 and 1000 respectively and are combined to obtain ReLu layer of dimensionality 1200.
* Discriminator maps *x* (input) and **y** to maxout layers and the joint maxout layer is fed to sigmoid layer.
* Results do not outperform the state-of-the-art results but do provide a proof-of-the-concept.
### Multimodal Setting
* Map images (from Flickr) to labels (or user tags) to obtain the one-to-many mapping.
* Extract image and text features using convolutional and language model.
* Generative Model
* Map noise and convolutional features to a single 200 dimensional representation.
* Discriminator Model
* Combine the representation of word vectors (corresponding to tags) and images.
## Future Work
* While the results are not so good, they do show the potential of Conditional GANs, especially in the multimodal setting.