Dual Learning for Machine Translation on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Dual Learning for Machine Translation
Yingce Xia and Di He and Tao Qin and Liwei Wang and Nenghai Yu and Tie-Yan Liu and Wei-Ying Ma
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CL
more

Summaries/Notes 3

[link] Summary by tqri 7 years ago

In this article, the authors provide a framework for training two translation models with large accessible monolingual corpus.

In traditional methods, machine translation models always require large parallel corpus to train a good quality model, which is expensive to acquire. However, the massive monolingual data is not fully utilized. The monolingual corpus are typically used in pretraining the NMT decoder rnn and augmenting initial parallel corpus through self-generated translations.

The authors embed machine translation task into a reinforcement learning framework, in which two agents act as two different native speakers respectively and know little about each other and then they learn to translate by trying to communicate with each other.

**The two speakers**, `A` and `B`, obviously know well about their corresponding language respectively, this situation is easily simulated by two well-trained language models for `A` and `B`. Then, speaker `A` tries to tell a sentence $x$ to `B` by translating it into $y$ in `B`'s language. Since they don't know each other, `B` is uncertain about what `A` truly means by saying $y$. However, `B` is capable of evaluate the degree of sensibility of $y$ from his own understanding. Next, `B` informs `A` his sensibility evaluation score and tries to recover what `A` truly means in `A`'s language, i.e. $x'$. And similarly, `A` can also evaluate the degree of sensibility of $x'$ from his own understanding.

In general, the very original idea that `A` tried to convey, is passed through a noisy channel to `B`, and then back to `A` through another noisy channel. The former noisy channel is a `A-B` translation model and the latter a `B-A` translation model in the framework.

Think about how the first American learnt Chinese in history and I think it is intuitively similar to the principle in this work.

How do they derive Eq.7? There appears to be multiple steps here that the paper does not show. (Eq.6 seems straight forward, though)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private