Building Self-Play Curricula Online by Playing with Expert Agents in Adversarial Games (2019)
Felipe Leno Da Silva, Anna Helena Reali Costa, and Peter Stone
Multiagent reinforcement learning algorithms are designed to enable an autonomous agent to adapt to an opponent's strategy based on experience. However, most such algorithms require a relatively large amount of experience to perform well. This requirement is problematic when opponent interactions are expensive, for example, when the agent has limited access to the opponent during training. In order to make good use of the opponent as a resource to support learning, we propose SElf-PLay by Expert Modeling (SEPLEM), an algorithm that models the opponent policy in a few episodes, and uses it to train in a simulated environment where it is cheaper to perform learning steps than in the real environment. Our empirical evaluation indicates that SEPLEM, by iteratively building a Curriculum of simulated tasks, achieves better performance than both only playing against the expert and using pure Self-Play techniques. SEPLEM is a promising technique to accelerate learning in multiagent adversarial tasks.
In Proceedings of the 8th Brazilian Conference on Intelligent Systems (BRACIS), Salvador, Bahia, Brazil, October 2019.

Peter Stone Faculty pstone [at] cs utexas edu