This paper explores the problem of question answering based on natural text. While this has been explored recently in the context of Memory Networks, the problems tackled so far have been synthetically generated. In this paper, the authors propose to extract from news sites more realistic question answering examples, by treating the main body of a news article as the content (the "facts") and extracting questions from the article's bullet point summaries. Specifically, by detecting the entities in these bullet points and replacing them with a question place older (e.g. "Producer X will not press charges"), they are able to generate queries which, while grammatically not being questions, do require to perform a form of question answering. Thanks to this procedure, two large *supervised* datasets are created, with several thousands of questions, based on the CNN and Daily Mail news sites. Then, the authors investigate neural network based systems for solving this task. They consider a fairly simple Deep LSTM network, which is first fed the article's content and then the query. They also consider two architectures that incorporate an attentional mechanism, based on softmax weighting. The first ("Attentive Reader") attends once in the document (i.e. uses a single softmax weight vector) while the second ("Impatient Reader") attends after every word in the query (akin to the soft attention architecture in the "Show Attend and Tell" paper). These neural network architectures are also compared with simpler baselines, which are closer to what a more "classical" statistical NLP solution might look like. Results on both datasets demonstrate that the neural network approaches have superior performance, with the attentional models being significantly better than the simpler Deep LSTM model. #### My two cents This is welcome development in the research on reasoning models based on neural networks. I've always thought it was unfortunate that the best benchmark available is based on synthetically generated cases. This work fixes this problem in a really clever way, while still being able to generate a large amount of training data. Particularly clever is the random permutation of entity markers when processing each case. Thanks to that, a system cannot simply use general statistics on words to answer questions (e.g. just from the query "The hi-tech bra that helps you beat breast X" it's obvious that "cancer" is an excellent answer). In this setup, the system is forced to exploit the content of the article, thus ensuring that the benchmark is indeed measuring the system's question-answering abilities. Since the dataset itself is an important contribution of this paper, I hope the authors release it publicly in the near future. The evaluation of the different neural architectures is also really thoroughly done. The non-neural baselines are reasonable and the comparison between the neural nets is itself interesting, bringing more evidence that the softmax weighted attentional mechanism (which has been gaining in popularity) indeed brings something over a regular LSTM approach.