Neural Turing Machines Neural Turing Machines
Paper summary Neural Turing Machine (NTM) consists of a neural network controller interacting with a working memory bank in a learnable manner. This is analogous to computers — controllers = CPU (hidden activations as registers) and memory matrix = RAM. Key ideas: - Controller (modified RNN) interacts with external world via input and output vectors, and with memory via read and write "heads" - "Read" vector is a convex combination of row-vectors of $M_t$ (memory matrix at time $t$) — $r\_t = \sum w\_t(i) M\_t(i)$ where w_t is a vector of weightings over N memory locations - "Writing" is decomposed into 1) erasing and 2) adding - The write head produces the erase vector e_t and the add vector a_t along with the vector of weightings over memory locations w_t - $M\_t(i) = M\_{t-1}(i)[1 - w_t(i) e_t] + w\_t(i) a\_t$ - Erase and add vectors control which components of memory are updated, while weightings w_t control which locations are updated - Weight vectors are produced by an addressing mechanism - Content-based addressing - Each head produces length M key k_t that is compared to each vector M_t(i) by cosine similarity and a temperature parameter. The weightings are normalized (softmax). - Location-based addressing - Interpolation: Each head produces interpolation gate g_t that is used to blend between weighting at previous time step and the content weighting of current tilmestep $w^{g}\_t = g\_t w^{c}\_t + (1-g\_t)w\_{t-1}$ - Shift: Circular convolution (modulo N) with a shift weighting distribution, for example softmax over integer shift positions (say 3 locations) - Sharpening: Each head emits \gamma_t to sharpen the final weighting - Experiments on copy, repeat-copy, associative memory, N-gram emulator and priority sort ## Links - [Attention and Augmented RNNs]( - [NTM-Lasagne](

Summary by Denny Britz 3 years ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and