Paper summaryarjoonnSince all algorithms can be modeled as multiple conditional branch operations, this paper allows you to incorporate conventional algorithms into neural networks by dynamically building the neural computation graph based on outputs of these algorithms.
They obtain near SOTA on Quora Duplicate Questions and SQuAD without heavily fine tuning the architecture to each problem.
One limitation is that the algorithm itself is not affected by the learning process and so cannot be learned.
This method provides a nice way to incorporate non-differentiable code into differentiable computation graphs which can be learned via backprop like learning mechanisms.
First published: 2017/12/07 (10 months ago) Abstract: Neural architecture is a purely numeric framework, which fits the data as a
continuous function. However, lacking of logic flow (e.g. \textit{if, for,
while}), traditional algorithms (e.g. \textit{Hungarian algorithm, A$^*$
searching, decision tress algorithm}) could not be embedded into this paradigm,
which limits the theories and applications. In this paper, we reform the
calculus graph as a dynamic process, which is guided by logic flow. Within our
novel methodology, traditional algorithms could empower numerical neural
network. Specifically, regarding the subject of sentence matching, we
reformulate this issue as the form of task-assignment, which is solved by
Hungarian algorithm. First, our model applies BiLSTM to parse the sentences.
Then Hungarian layer aligns the matching positions. Last, we transform the
matching results for soft-max regression by another BiLSTM. Extensive
experiments show that our model outperforms other state-of-the-art baselines
substantially.
Since all algorithms can be modeled as multiple conditional branch operations, this paper allows you to incorporate conventional algorithms into neural networks by dynamically building the neural computation graph based on outputs of these algorithms.
They obtain near SOTA on Quora Duplicate Questions and SQuAD without heavily fine tuning the architecture to each problem.
One limitation is that the algorithm itself is not affected by the learning process and so cannot be learned.
This method provides a nice way to incorporate non-differentiable code into differentiable computation graphs which can be learned via backprop like learning mechanisms.