Multiagent Learning Algorithm for Hanabi Game using Artificial Intelligence
Hanabi is a cooperative game that tests current AI algorithms by focusing on modelling other players' mental states in order to interpret and anticipate their actions. While some agents can obtain near-perfect scores in the game by agreeing on a shared strategy, ad-hoc cooperation situations, where partners and strategies are unknown in advance, have made relatively little progress. In this paper, we show that agents trained through self-play using the popular Rainbow DQN architecture fail to cooperate well with simple rule-based agents that were not seen during training, and that these agents fail to achieve good self-play scores when trained to play with any individual rule-based agent, or even a mix of these agents. Hanabi appeals to humans because it is entirely focused on theory of mind, that is, the ability to properly reason over the intents, beliefs, and point of view of other agents while observing their behaviour. Reinforcement Learning (RL) has an unusual issue in learning to be informative when seen by others: at its core, RL requires agents to explore in order to identify appropriate policies. When done naively, however, this unpredictability will inevitably make their behaviour throughout training less informative to others. We introduce a new deep multi-agent Reinforcement Learning approach that exploits the centralized training phase to address this paradox.