Improving MMD-GAN Training with Repulsive Loss Function
Improving MMD-GAN Training with Repulsive Loss Function
Wei Wang and Yuan Sun and Saman Halgamuge
2018

Paper summary
richardwth
**TL;DR**: Rearranging the terms in Maximum Mean Discrepancy yields a much better loss function for the discriminator of Generative Adversarial Nets.
**Keywords**: Generative adversarial nets, Maximum Mean Discrepancy, spectral normalization, convolutional neural networks, Gaussian kernel, local stability.
**Summary**
Generative adversarial nets (GANs) are widely used to learn the data sampling process and are notoriously difficult to train. The training of GANs may be improved from three aspects: loss function, network architecture, and training process.
This study focuses on a loss function called the Maximum Mean Discrepancy (MMD), defined as:
$$
MMD^2(P_X,P_G)=\mathbb{E}_{P_X}k_{D}(x,x')+\mathbb{E}_{P_G}k_{D}(y,y')-2\mathbb{E}_{P_X,P_G}k_{D}(x,y)
$$
where $G,D$ are the generator and discriminator networks, $x,x'$ are real samples, $y,y'$ are generated samples, $k_D=k\circ D$ is a learned kernel that calculates the similariy between two samples. Overall, MMD calculates the distance between the real and the generated sample distributions. Thus, traditionally, the generator is trained to minimize $L_G=MMD^2(P_X,P_G)$, while the discriminator minimizes $L_D=-MMD^2(P_X,P_G)$.
This study makes three contributions:
- It argues that $L_D$ encourages the discriminator to ignores the fine details in real data. By minimizing $L_D$, $D$ attempts to maximize $\mathbb{E}_{P_X}k_{D}(x,x')$, the similarity between real samples scores. Thus, $D$ has to focus on common features shared by real samples rather than fine details that separate them. This may slow down training. Instead, a repulsive loss is proposed, with no additional computational cost to MMD:
$$
L_D^{rep}=\mathbb{E}_{P_X}k_{D}(x,x')-\mathbb{E}_{P_G}k_{D}(y,y')
$$
- Inspired by the hinge loss, this study proposes a bounded Gaussian kernel for the discriminator to facilitate stable training of MMD-GAN.
- The spectral normalization method divides the weight matrix at each layer by its spectral norm to enforce that each layer is Lipschitz continuous. This study proposes a simple method to calculate the spectral norm of a convolutional kernel.
The results show the efficiency of proposed methods on CIFAR-10, STL-10, CelebA and LSUN-bedroom datasets. In Appendix, we prove that MMD-GAN training using gradient method is locally exponentially stable (a property that the Wasserstein loss does not have), and show that the repulsive loss works well with gradient penalty.
The paper has been accepted at ICLR 2019 ([OpenReview link](https://openreview.net/forum?id=HygjqjR9Km)). The code is available at [GitHub link](https://github.com/richardwth/MMD-GAN).
Improving MMD-GAN Training with Repulsive Loss Function

Wei Wang and Yuan Sun and Saman Halgamuge

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.LG, cs.CV, stat.ML

**First published:** 2018/12/24 (1 year ago)

**Abstract:** Generative adversarial nets (GANs) are widely used to learn the data sampling
process and their performance may heavily depend on the loss functions, given a
limited computational budget. This study revisits MMD-GAN that uses the maximum
mean discrepancy (MMD) as the loss function for GAN and makes two
contributions. First, we argue that the existing MMD loss function may
discourage the learning of fine details in data as it attempts to contract the
discriminator outputs of real data. To address this issue, we propose a
repulsive loss function to actively learn the difference among the real data by
simply rearranging the terms in MMD. Second, inspired by the hinge loss, we
propose a bounded Gaussian kernel to stabilize the training of MMD-GAN with the
repulsive loss function. The proposed methods are applied to the unsupervised
image generation tasks on CIFAR-10, STL-10, CelebA, and LSUN bedroom datasets.
Results show that the repulsive loss function significantly improves over the
MMD loss at no additional computational cost and outperforms other
representative loss functions. The proposed methods achieve an FID score of
16.21 on the CIFAR-10 dataset using a single DCGAN network and spectral
normalization.
more
less

Wei Wang and Yuan Sun and Saman Halgamuge

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.LG, cs.CV, stat.ML

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

About