Diversifying Training Pool Predictability for Zero-shot Coordination: A Theory of Mind Approach
Published in IJCAI, 2024
The challenge in constructing artificial social agents is to enable adaptation ability to novel agents, and is called zero-shot coordination (ZSC). A promising approach is to train the adaptive agents by interacting with a diverse pool of collaborators, assuming that the greater the diversity in other agents seen during training, the better the generalisation. In this paper, we explore an alternative procedure by considering the behavioural predictability of collaborators, i.e. whether their actions and intentions are predictable, and use it to select a diverse set of agents for the training pool. More specifically, we develop a pool of agents through self-play training during which agents’ behaviour evolves and has diversity in levels of behavioural predictability (LoBP) through its evolution. We construct an observer to compute the level of behavioural predictability for each version of the collaborators. To do so, the observer is equipped with the theory of mind (ToM) capability to learn to infer the actions and intentions of others. We then use an episodic memory based on the LoBP metric to maintain agents with different levels of behavioural predictability in the pool of agents. Since behaviours that emerge at the later training phase are more complex and meaningful, the memory is updated with the latest versions of training agents. Our extensive experiments demonstrate that LoBP-based diversity training leads to better ZSC than other diversity training methods.