This a nice, compact paper testing a straightforward idea: can we use the contrastive loss structure so widespread in unsupervised learning as a framework for generating and training against adversarial examples? In the context of the adversarial examples literature, adversarial training - or, training against examples that were adversarially generated so as to minimize the loss of the model you're training - is the primary strategy used to train robust models (robust here in the sense of not being susceptible to said adversarial attacks). Typically, these attacks are generated with the use of class labels, since they are meant to attack supervised classifiers that assign a class label to an image. Therefore, the goal of the adversarial attack is to push down the probability of the correct class label (either in favor of a specific alternate class, or just in favor of any class that isn't the true one). However, labels are hard and expensive, so, one wonders: in the same way that you can learn representations from unlabeled data, can you also make those representations (otherwise referred to as "embeddings") robust in a similarly label-free way. This paper tests an approach that does so in a quite simple way, by just generating adversarial examples against your contrastive loss target. This works by: 1) Taking an image, and generating two augmentations (or transformations) of it. This is part of the standard contrastive pipeline 2) Applying an adversarial perturbation to one of those transformations, where the perturbation is optimized to maximize the contrastive loss (ability to differentiate an augmented version of the same image from augmented versions of other images) 3) Training on that adversarial sample to generate more robustness https://i.imgur.com/ttF6k1A.png And this simple approach appears to work quite well! They find that, in defending against supervised adversarial attacks, it performs comparably to supervised adversarial training, and that it has the added benefits of (1) slightly higher accuracy on clean examples (in general, robustness is known to decrease your clean-sample accuracy), and (2) better robustness against attack types other than the attack type used for the adversarial training. It also achieves better transfer performance (that is, adversarially training on one dataset, and then evaluating robustness on another) than a supervised method, when evaluated on both CIFAR10 → CIFAR100 and CIFAR100 → CIFAR10. This does make pretty good sense to me, since instance-level stability does seem like it's getting at a more fundamental set of invariances that to would transfer better to different distributions of classes.