Neural Generation of Regular Expressions from Natural Language with Minimal Domain KnowledgeNeural Generation of Regular Expressions from Natural Language with Minimal Domain KnowledgeNicholas Locascio and Karthik Narasimhan and Eduardo DeLeon and Nate Kushman and Regina Barzilay2016
Paper summaryshagunsodhani#### Introduction
* Task of translating natural language queries into regular expressions without using domain specific knowledge.
* Proposes a methodology for collecting a large corpus of regular expressions to natural language pairs.
* Reports performance gain of 19.6% over state-of-the-art models.
* [Link to the paper](http://arxiv.org/abs/1608.03000v1)
* LSTM based sequence to sequence neural network (with attention)
* Six layers
* One-word embedding layer
* Two encoder layers
* Two decoder layers
* One dense output layer.
* Attention over encoder layer.
* Dropout with the probability of 0.25.
* 20 epochs, minibatch size of 32 and learning rate of 1 (with decay rate of 0.5)
#### Dataset Generation
* Created a public dataset - **NL-RX** - with 10K pair of (regular expression, natural language)
* Two step generate-and-paraphrase approach
* Generate step
* Use handcrafted grammar to translate regular expressions to natural language.
* Paraphrase step
* Crowdsourcing the task of translating the rigid descriptions into more natural expressions.
* Evaluation Metric
* Functional equality check (called DFA-Equal) as same regular expression could be written in many ways.
* Proposed architecture outperforms both the baselines - Nearest Neighbor classifier using Bag of Words (BoWNN) and Semantic-Unify
Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge
arXiv e-Print archive - 2016 via Local arXiv
First published: 2016/08/09 (3 years ago) Abstract: This paper explores the task of translating natural language queries into
regular expressions which embody their meaning. In contrast to prior work, the
proposed neural model does not utilize domain-specific crafting, learning to
translate directly from a parallel corpus. To fully explore the potential of
neural models, we propose a methodology for collecting a large corpus of
regular expression, natural language pairs. Our resulting model achieves a
performance gain of 19.6% over previous state-of-the-art models.