Efficient softmax approximation for GPUs
Paper summary Modification of the 2-level hierarchical softmax for better efficiency. An equation of computational complexity is used to find the optimal number of words in each class. In addition, the most common words are considered on the same level as other classes. https://i.imgur.com/dbKS3gh.png

