[link]
_Objective:_ Perform domainadaptation by adapting several layers using a randomized representation and not just the final layer thus performing alignment of the joint distribution and not just the marginals. _Dataset:_ [Office](https://cs.stanford.edu/%7Ejhoffman/domainadapt/) and [ImageCLEFDA1](http://imageclef.org/2014/adaptation). ## Innerworkings: Basically an improvement on [RevGrad](https://arxiv.org/pdf/1505.07818.pdf) where instead of using the last embedding layer for the discriminator, a bunch of them is used. To avoid dimension explosion when using the tensor product of all layers they instead use a randomized multilinear representation: [![screen shot 20170601 at 5 35 46 pm](https://cloud.githubusercontent.com/assets/17261080/26687736/cff2044646f011e7918eb60baa10aa67.png)](https://cloud.githubusercontent.com/assets/17261080/26687736/cff2044646f011e7918eb60baa10aa67.png) Where: * d is the dimension of the embedding (they use 1024) * R is random matrix for which each element as a null average and variance of 1 (Bernoulli, Gaussian and Uniform are tried) * z^l is the lth layer * ⊙ represents the Hadamard product In practice they don't use all layers but just the 34 last layers for ResNet and AlexNet. ## Architecture: [![screen shot 20170601 at 5 34 44 pm](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d9846f011e789d115452cbb527e.png)](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d9846f011e789d115452cbb527e.png) They use the usual losses for domain adaptation with:  F minimizing the crossentropy loss for classification and trying to reduce the gap between the distributions (indicated by D).  D maximizing the gap between the distributions. [![screen shot 20170601 at 5 40 53 pm](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff7046f111e7917d05129ab190b0.png)](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff7046f111e7917d05129ab190b0.png) ## Results: Improvement on stateoftheart results for most tasks in the dataset, very easy to implement with any pretrained network out of the box.
Your comment:
