Efficient softmax approximation for GPUsEfficient softmax approximation for GPUsGrave, Edouard and Joulin, Armand and Cissé, Moustapha and Grangier, David and Jégou, Hervé2017
Paper summarymarekModification of the 2-level hierarchical softmax for better efficiency. An equation of computational complexity is used to find the optimal number of words in each class. In addition, the most common words are considered on the same level as other classes.
https://i.imgur.com/dbKS3gh.png
Modification of the 2-level hierarchical softmax for better efficiency. An equation of computational complexity is used to find the optimal number of words in each class. In addition, the most common words are considered on the same level as other classes.
https://i.imgur.com/dbKS3gh.png