[FashionGAN] works as follows. Given an input image of a person and a sentence describing an outfit, the model tries to "redress" the person in the image.
The Generator in the model is stacked.
* The first stage of the generator gets as input a low resolution version of the segmentation of the input image (which is obtained independently) and the design encoding, and generates a **human segmentation map** (not dressed).
* Then in the second stage, the model renders the generated image using another generator conditioned on the design encoding. It adds region specific texture using the segmentation map and generates the final image.
![FashionGAN Model](https://i.imgur.com/DzwB8xm.png "FasionGAN model")
They added sentence descriptions to a subset of the [DeepFashion dataset] (79k examples).