This paper performs a fascinating toy experiment, to try to see if something language-like in structure can be effectively induced in a population of agents, if they are given incentives that promote it. In some sense, a lot of what they find “just makes sense,” but it’s still a useful proof of concept to show that it can be done. The experiment they run takes place in a simple, two-dimensional world, with a fixed number of landmarks (representing locations goals need to take place), and agents, and actions. In this construction, each agent has a set of internal goals, which can either be actions (like “go to green landmark”) they themselves need to perform, or actions that they want another agent to perform. Agents’ goals are not visible to other agents, but all agents’ reward is defined to be the aggregated reward of all agents together, so if agent A has a goal involving an action of agent B’s, it’s in B’s “interest” to do that action, if it can be communicated to them. In order to facilitate other agents performing goals, at each step, each agent both takes an action, and also emits an “utterance”, which is just a discrete symbolic “word” out of some some fixed vocabulary of words (Note that applying “word” here is a but fuzzy; the agents do not pronounce or spell a character-based word, they just pick a discrete symbol that is playing the role of a word”. Even though other agents cannot see a given agent’s goals, they can see its public utterances, and so agents learn that communication is a way to induce other agents to perform desired actions. As a mathematically interesting aside: this setup, of allowing each agent to sample a single discrete word out of a small vocabulary at each setting, takes the deployment of some interesting computational tricks to accomplish. First off, in general, sampling a discrete single symbol out of a set of possible symbols is not differentiable, since it’s a discrete rather than continuous action, and derivatives require continuous functions. However, a paper from 2016 proposed a (heuristic) solution to this problem by means of the Gumbel Softmax Trick. This derives from the older “Gumbel Max Trick”, which is the mathematical fact that if you want to sample from a categorical distribution, a computationally easy way to do so is to add a variable sampled from a (0,1) Gumbel distribution to the log probability of each category, and then take the argmax of this as the index of the sample category (I’m not going to go another level down into why this is true, since I think it’s too far afield of the scope of this summary). Generally, argmax functions are also not differentiable. However, they can be approximated with softmaxes, which interpolate between a totally uniform and very nearly discrete-sample distribution based on a temperature parameter. In practice, or, at least, if this paper does what the original Gumbel Softmax paper did, during training, a discrete sample is taken, but a low-temperature continuous approximation is used for actual gradient calculation (i.e. for gradients, the model pretends that it used the continuous approximation rather than the discrete sample). https://i.imgur.com/0RpRJG2.png Coming back to the actual communication problem, the authors do find that under these (admittedly fairly sanitized and contrived) circumstances, agents use series of discrete symbols to communicate goals to other agents, which ends up looking a lot like a very simple language. https://i.imgur.com/ZF0EbN4.png As one might expect, in environments where there were only two agents, there was no symbol that ended up corresponding to “red agent” or “blue agent”, since each could realize that the other was speaking to it. However, in three-agent environments, the agents did develop symbols that clearly mapped to these categories, to specify who directions were being given to. The authors also tried cutting off verbal communication; in these situations, the agents used gaze and movement to try to signal what they wanted other agents to do. Probably most entertainingly, when neither verbal nor visual communication was allowed, agents would move to and “physically” push other agents to the location where their action needed to be performed.