#### Introduction * Large scale natural language understanding task - predict text values given a knowledge base. * Accompanied by a large dataset generated using Wikipedia * [Link to the paper](http://www.aclweb.org/anthology/P/P16/P16-1145.pdf) #### Dataset * WikiReading dataset built using Wikidata and Wikipedia. * Wikidata consists of statements of the form (property, value) about different items * 80M statements, 16M items and 884 properties. * These statements are grouped by items to get (item, property, answer) tuples where the answer is a set of values. * Items are further replaced by their Wikipedia documents to generate 18.58M statements of the form (document, property, answer). * Task is to predict answer given document and property. * Properties are divided into 2 classes: * **Categorical properties** - properties with a small number of possible answers. Eg gender. * **Relational properties** - properties with unique answers. Eg date of birth. * This classification is done on the basis of the entropy of answer distribution. * Properties with entropy less than 0.7 are classified as categorical properties. * Answer distribution has a small number of very high-frequency answers (head) and a large number of answers with very small frequency (tail). * 30% of the answers do not appear in the training set and must be inferred from the document. #### Models ##### Answer Classification * Consider WikiReading as classification task and treat each answer as a class label. ###### Baseline * Linear model over Bag of Words (BoW) features. * Two BoW vectors computed - one for the document and other for the property. These are concatenated into a single feature vector. ###### Neural Networks Method * Encode property and document into a joint representation which is fed into a softmax layer. * **Average Embeddings BoW** * Average the BoW embeddings for documents and property and concatenate to get joint representation. * **Paragraph Vectors** * As a variant of the previous method, encode document as a paragraph vector. * **LSTM Reader** * LSTM reads the property and document sequence, word-by-word, and uses the final state as joint representation. * **Attentive Reader** * Use attention mechanism to focus on relevant parts of the document for a given property. * **Memory Networks** * Maps a property p and list of sentences x<sub>1</sub>, x<sub>2</sub>, ...x<sub>n</sub> in a joint representation by attention over the sentences in the document. ##### Answer Extraction * For relational properties, it makes more sense to model the problem as information extraction than classification. * **RNNLabeler** * Use an RNN to read the sequence of words and estimate if a given word is part of the answer. * **Basic SeqToSeq (Sequence to Sequence)** * Similar to LSTM Reader but augmented with a second RNN to decode answer as a sequence of words. * **Placeholder SeqToSeq** * Extends Basic SeqToSeq to handle OOV (Out of Vocabulary) words by adding placeholders to the vocabulary. * OOV words in the document and answer are replaced by placeholders so that input and output sentences are a mixture of words and placeholders only. * **Basic Character SeqToSeq** * Property encoder RNN reads the property, character-by-character and transforms it into a fixed length vector. * This becomes the initial hidden state for the second layer of a 2-layer document encoder RNN. * Final state of this RNN is used by answer decoder RNN to generate answer as a character sequence. * **Character SeqToSeq with pretraining** * Train a character-level language model on input character sequence from the training set and use the weights to initiate the first layer of encoder and decoder. #### Experiments * Evaluation metric is F1 score (harmonic mean of precision and accuracy). * All models perform well on categorical properties with neural models outperforming others. * In the case of relational properties, SeqToSeq models have a clear edge. * SeqToSeq models also show a great deal of balance between relational and categorical properties. * Language model pretraining enhances the performance of character SeqToSeq approach. * Results demonstrate that end-to-end SeqToSeq models are most promising for WikiReading like tasks.