#### Introduction * Build a supervised reading comprehension data set using news corpus. * Compare the performance of neural models and state-of-the-art natural language processing model on reading comprehension task. * [Link to the paper](http://arxiv.org/abs/1506.03340v3) #### Reading Comprehension * Estimate conditional probability $p(a|c, q)$, where $c$ is a context document, $q$ is a query related to the document, and $a$ is the answer to that query. #### Dataset Generation * Use online newspapers (CNN and DailyMail) and their matching summaries. * Parse summaries and bullet points into Cloze style questions. * Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder. * Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers. * The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence. #### Models ##### Baseline Models * **Majority Baseline** * Picks the most frequently observed entity in the context document. * **Exclusive Majority** * Picks the most frequently observed entity in the context document which is not observed in the query. ##### Symbolic Matching Models * **Frame-Semantic Parsing** * Parse the sentence to find predicates to answer questions like "who did what to whom". * Extracting entity-predicate triples $(e1,V, e2)$ from query $q$ and context document $d$ * Resolve queries using rules like `exact match`, `matching entity` etc. * **Word Distance Benchmark** * Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity. * Sum the distance of every word in $q$ to their nearest aligned word in $d$ ##### Neural Network Models * **Deep LSTM Reader** * Test the ability of Deep LSTM encoders to handle significantly longer sequences. * Feed the document query pair as a single large document, one word at a time. * Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output. * **Attentive Reader** * Employ attention model to overcome the bottleneck of fixed width hidden vector. * Encode the document and the query using separate bidirectional single layer LSTM. * Query encoding is obtained by concatenating the final forward and backwards outputs. * Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs). * The weights can be interpreted as the degree to which the network attends to a particular token in the document. * Model completed by defining a non-linear combination of document and query embedding. * **Impatient Reader** * As an add-on to the attentive reader, the model can re-read the document as each query token is read. * Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding. #### Result * Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling. * Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query. * Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure. * Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document. * The paper also includes heat maps over the context documents to visualise the attention mechanism.