This is a simple unsupervised method for learning word-level translation between embeddings of two different languages. That's right -- unsupervised. The basic motivating hypothesis is that there should be an isomorphism between the "semantic spaces" of different languages: > we hypothesize that, if languages are used to convey thematically similar information in similar contexts, these random processes should be approximately isomorphic between languages, and that this isomorphism can be learned from the statistics of the realizations of these processes, the monolingual corpora, in principle without any form of explicit alignment. If you squint a bit, you can make the more aggressive claim from this premise that there should be a nonlinear / MLP mapping between *word embedding spaces* that gets us the same result. The author uses the adversarial autoencoder (AAE, from Makhzani last year) framework in order to enforce a cross-lingual semantic mapping in word embedding spaces. The basic setup for adversarial training between a source and a target language: 1. Sample a batch of words from the source language according to the language's word frequency distribution. 2. Sample a batch of words from the target language according to its word frequency distribution. (No sort of relationship is enforced between the two samples here.) 3. Feed the word embeddings corresponding to the source words through an *encoder* MLP. This corresponds to the standard "generator" in a GAN setup. 4. Pass the generator output to a *discriminator* MLP along with the target-language word embeddings. 5. Also pass the generator output to a *decoder* which maps back to the source embedding distribution. 6. Update weights based on a combination of GAN loss + reconstruction loss. ### Does it work? We don't really know. The paper is unfortunately short on evaluation --- we just see a few examples of success and failure on a trained model. One easy evaluation would be to compare accuracy in lexical mapping vs. corpus frequency of the source word. I would bet that this would reveal the model hasn't done much more than learn to align a small set of high-frequency words.