[link]
#### Introduction * Introduces fastText, a simple and highly efficient approach for text classification. * At par with deep learning models in terms of accuracy though an order of magnitude faster in performance. * [Link to the paper](http://arxiv.org/abs/1607.01759v3) * [Link to code](https://github.com/facebookresearch/fastText) #### Architecture * Built on top of linear models with a rank constraint and a fast loss approximation. * Start with word representations that are averaged into text representation and feed them to a linear classifier. * Think of text representation as a hidden state that can be shared among features and classes. * Softmax layer to obtain a probability distribution over predefined classes. * High computational complexity $O(kh)$, $k$ is the number of classes and $h$ is dimension of text representation. ##### Hierarchial Softmax * Based on Huffman Coding Tree * Used to reduce complexity to $O(hlog(k))$ * Top T results (from the tree) can be computed efficiently $O(logT)$ using a binary heap. ##### Ngram Features * Instead of explicitly using word order, uses a bag of ngrams to maintain efficiency without losing on accuracy. * Uses [hashing trick](https://arxiv.org/pdf/0902.2206.pdf) to maintain fast and memory efficient mapping of the ngrams. #### Experiments ##### Sentiment Analysis * fastText benefits by using bigrams. * Outperforms [charCNN](http://arxiv.org/abs/1502.01710v5) and [charCRNN](http://arxiv.org/abs/1602.00367v1) and performs a bit worse than [VDCNN](http://arxiv.org/abs/1606.01781v1). * Order of magnitudes faster in terms of training time. * Note: fastText does not use pretrained word embeddings. ##### Tag Prediction * fastText with bigrams outperforms [Tagspace](http://emnlp2014.org/papers/pdf/EMNLP2014194.pdf). * fastText performs upto 600 times faster at test time.
Your comment:
