One of the dominant narratives of the deep learning renaissance has been the value of well-designed inductive bias - structural choices that shape what a model learns. The biggest example of this can be found in convolutional networks, where models achieve a dramatic parameter reduction by having features maps learn local patterns, which can then be re-used across the whole image. This is based on the prior belief that patterns in local images are generally locally contiguous, and so having feature maps that focus only on small (and gradually larger) local areas is a good fit for that prior. This paper operates in a similar spirit, except its input data isn’t in the form of an image, but a graph: the social graph of multiple agents operating within a Multi Agent RL Setting. In some sense, a graph is just a more general form of a pixel image: where a pixel within an image has a fixed number of neighbors, which have fixed discrete relationships to it (up, down, left, right), nodes within graphs have an arbitrary number of nodes, which can have arbitrary numbers and types of attributes attached to that relationship. The authors of this paper use graph networks as a sort of auxiliary information processing system alongside a more typical policy learning framework, on tasks that require group coordination and knowledge sharing to complete successfully. For example, each agent might be rewarded based on the aggregate reward of all agents together, and, in the stag hunt, it might require collaborative effort by multiple agents to successfully “capture” a reward. Because of this, you might imagine that it would be valuable to be able to predict what other agents within the game are going to do under certain circumstances, so that you can shape your strategy accordingly. The graph network used in this model represents both agents and objects in the environment as nodes, which have attributes including their position, whether they’re available or not (for capture-able objects), and what their last action was. As best I can tell, all agents start out with directed connections going both ways to all other agents, and to all objects in the environment, with the only edge attribute being whether the players are on the same team, for competitive environments. Given this setup, the graph network works through a sort of “diffusion” of information, analogous to a message passing algorithm. At each iteration (analogous to a layer), the edge features pull in information from their past value and sender and receiver nodes, as well as from a “global feature”. Then, all of the nodes pull in information from their edges, and their own past value. Finally, this “global attribute” gets updated based on summations over the newly-updated node and edge information. (If you were predicting attributes that were graph-level attributes, this global attribute might be where you’d do that prediction. However, in this case, we’re just interested in predicting agent-level actions). https://i.imgur.com/luFlhfJ.png All of this has the effect of explicitly modeling agents as entities that both have information, and have connections to other entities. One benefit the authors claim of this structure is that it allows them more interpretability: when they “play out” the values of their graph network, which they call a Relational Forward Model or RFM, they observe edge values for two agents go up if those agents are about to collaborate on an action, and observe edge values for an agent and an object go up before that object is captured. Because this information is carefully shaped and structured, it makes it easier for humans to understand, and, in the tests the authors ran, appears to also help agents do better in collaborative games. https://i.imgur.com/BCKSmIb.png While I find graph networks quite interesting, and multi-agent learning quite interesting, I’m a little more uncertain about the inherent “graphiness” of this problem, since there aren’t really meaningful inherent edges between agents. One thing I am curious about here is how methods like these would work in situations of sparser graphs, or, places where the connectivity level between a node’s neighbors, and the average other node in the graph is more distinct. Here, every node is connected to every other node, so the explicit information localization function of graph networks is less pronounced. I might naively think that - to whatever extent the graph is designed in a way that captures information meaningful to the task - explicit graph methods would have an even greater comparative advantage in this setting.