Hybrid computing using a neural network with dynamic external memory Hybrid computing using a neural network with dynamic external memory
Paper summary The paper introduces an approach to model external memory in a differentiable way. Memory is modeled as $N \times W$ matrix. The memory has $N$ independent storage location each being able to store a datum of length $W$. ### DNC Architecture A controller network is trained to utilize the memory. Memory access is modeled in cycles. At each time-step $t$ the network emits read and write command as part of its output. The commands are than processed, read data is given to the network at time-step $t+1$ as part of its input. Any deep learning network architecture can be used as Controller network (e.g. standard feed-forward CNN). The paper utilizes a deep LSTM architecture is used as controller network. https://storage.googleapis.com/deepmind-live-cms/images/dnc_figure1.width-1500.png ### Memory Interaction The control commands allow the network to interact with the data in three different ways: 1. Content Lookup: access is controlled by how closely a given location matches a predefined key. 2. Sequential read access: For each read vector $v$ the network receives as input at time $t$ it can access the data which was written directly after $v$. 3. Usage based write access: A "usage" vector $u \in [0,1]^N$ models the importance of each location. The network can choose to write data based on the lowest usage level. $u$ can be decreased ("erased memory") at each time-step. #### Differentiable Operations All memory operations are modeled in a differentiable way, so that the entire model can be trained end-to-end using gradient descend. To do this all read commands are mapped to a soft read vector $w^r \in [0,1]^N$, such that $\sum^N_{i=1} w^r_i = 1$. The read vector $r$ is then defined as weighted sum over the rows of the memory: $r = \sum^N_{i=1} M[i, \cdot] w^r_i = 1$. ### Experiments The network was trained to perform a variety of different, memory intensive tasks. The results show, that the network is able to learn to take advantage of the external memory. https://www.youtube.com/watch?v=B9U8sI7TcMY

Your comment:

Short Science allows researchers to publish paper summaries that are voted on and ranked!