A simple neural network module for relational reasoning on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

A simple neural network module for relational reasoning
Adam Santoro and David Raposo and David G. T. Barrett and Mateusz Malinowski and Razvan Pascanu and Peter Battaglia and Timothy Lillicrap
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CL, cs.LG
more

Summaries/Notes 2

[link] Summary by kdubovikov 6 years ago

The paper proposes a reusable neural network module to `reason about the relations between entities and their properties`:

$$ RN(O) = f_\phi \left( \sum_{i,j} g_\theta(o_i, o_j) \right), $$

- $O$ is a set of input objects $\{o_1, o_2, ..., o_n\}, o_i \in R^m$
- $g_\theta$ is a neural network (MLP) which approximates object-to-object relation function
- $f_\phi $ is a neural network (MLP) which transforms summed pairwise object-to-object relations to some desired output

RN's operate on sets (due to summation in the formula) and thus are invariant to the order of objects in the input.

In terms of architecture, RN module is used at the tail of a neural network taking input objects in form of CNN or LSTM embeddings.

This work is evaluated on several tasks where it achieves reasonably good (even superhuman) performance:
- CLEVR and Sort-of-CLEVR - question answering about an image
- bAbI - text based question answering
- Dynamic physical system - MuJoCo simulations with physical relation between entities

Are also domain and range, i.e. non bidirectional but one way direction relationships, also being able to be learned here?

Your comment:

[link] Summary by Abhishek Das 6 years ago

This paper describes using Relation Networks (RN) for reasoning about relations between objects/entities.
RN is a plug-and-play module and although expects object representations as input,
the semantics of what an object is need not be specified, so object representations
can be convolutional layer feature vectors or entity embeddings from text, or something else.
And the feedforward network is free to discover relations between objects (as opposed to being
hand-assigned specific relations).

- At its core, RN has two parts:
	- a feedforward network `g` that operates on pairs of object representations,
	for all possible pairs, all pairwise computations pooled via element-wise addition
	- a feedforward network `f` that operates on pooled features for downstream
	task, everything being trained end-to-end

- When dealing with pixels (as in CLEVR experiment), individual object representations are
spatially distinct convolutional layer features (196 512-d object representations for VGG conv5 say).
The other experiment on CLEVR uses explicit factored object state representations with 3D coordinates,
shape, material, color, size.

- For bAbI, object representations are LSTM encodings of supporting sentences.

- For VQA tasks, `g` conditions its processing on question encoding as well, as relations
that are relevant for figuring out the answer would be question-dependent.


## Strengths

- Very simple idea, clearly explained, performs well. Somewhat shocked that it
hasn't been tried before.

## Weaknesses / Notes

Fairly simple idea — let a feedforward network
operate on all pairs of object representations and figure out relations
necessary for downstream task with end-to-end training. And it is fairly general in its design,
relations aren't hand-designed and neither are object representations — for
RGB images, these are spatially distinct convolutional layer features, for text,
these are LSTM encodings of supporting facts, and so on. This module can be dropped
in and combined with more sophisticated networks to improve performance at VQA.

RNs also offer an alternative design choice to prior works on CLEVR, that have
this explicit notion of programs or modules with specialized roles (that need to be pre-defined),
as opposed to letting these relations emerge, reducing dependency on hand-designing
modules and adding in inductive biases from an architectural point-of-view for
the network to reason about relations (earlier end-to-end VQA models didn't have
the capacity to figure out relations).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private