Adversarial Self-Defense for Cycle-Consistent GANsAdversarial Self-Defense for Cycle-Consistent GANsBashkirova, Dina and Usman, Ben and Saenko, Kate2019
Paper summarydecodyngDomain translation - for example, mapping from a summer to a winter scene, or from a photorealistic image to an object segmentation map - is often performed by GANs through something called cycle consistency loss. This model works by having, for each domain, a generator to map domain A into domain B, and a discriminator to differentiate between real images from domain B, and those that were constructed through the cross-domain generator. With a given image in domain A, training happens by using the A→B generator to map it into domain B, and then then B→ A generator to map it back the original domain. These generators are then trained using two losses: one based on the B-domain discriminator, to push the generated image to look like it belongs from that domain, and another based on the L2 loss between the original domain A image, and the image you get on the other end when you translate it into B and back again.
This paper addresses an effect (identified originally in an earlier paper) where in domains with a many to one mapping between domains (for example, mapping a realistic scene into a domain segmentation map, where information is inherently lost by translating pixels to object outlines), the cycle loss incentivizes the model to operate in a strange, steganographic way, where it saves information about the that would otherwise be lost in the form of low-amplitude random noise in the translated image. This low-amplitude information can't be isolated, but can be detected in a few ways. First, we can simply examine images and notice that information that could not have been captured in the lower-information domain is being perfectly reconstructed. Second, if you add noise to the translation in the lower-information domain, in such a way as to not perceptibly change the translation to human eyes, this can cause the predicted image off of that translation to deteriorate considerably, suggesting that the model was using information that could be modified by such small additions of noise to do its reconstruction.
The authors of this paper ask whether it's possible to train models that don't perform this steganographic information-storing (which they call "self adversarial examples"). A typical approach to such a problem would be to train generators to perform translations with and without the steganographic information, but even though we can prove the existence of the information, we can't isolate it in a way that would allow us to remove it, and thus create these kinds of training pairs. The two tactics the paper uses are:
1) Simply training the generators to be able to translate a domain-mapped image with noise as well as one without noise, in the hope that this would train it not use information that can be interfered with by the application of such noise.
2) In addition to a L2 cycle loss, adding a discriminator to differentiate between the back-translated image and the original one. I believe the idea here is that if both of the encoders are adding in noise as a kind of secret signal, this would be a way for the discriminator to distinguish between the original and reconstructed image, and would thus be penalized.
They find that both of these methods reduce the use of steganographic information, as determined both by sensitivity to noise (where less sensitivity of reconstruction to noise means less use of coded information) and reconstruction honesty (which constrains accuracy of reconstruction in many to one domains to be no greater than the prediction that a supervised predictor could make given the image from the compressed domain