Cooperating with a Markovian Ad Hoc Teammate (2013)
This paper focuses on learning in the presence of a Markovian teammate in Ad hoc teams. A Markovian teammate's policy is a function of a set of discrete feature values derived from the joint history of interaction, where the feature values transition in a Markovian fashion on each time step. We introduce a novel algorithm ``Learning to Cooperate with a Markovian teammate'', or LCM, that converges to optimal cooperation with any Markovian teammate, and achieves safety with any arbitrary teammate. The novel aspect of LCM is the manner in which it satisfies the above two goals via efficient exploration and exploitation. The main contribution of this paper is a full specification and a detailed analysis of LCM's theoretical properties.
In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013.

Doran Chakraborty Ph.D. Alumni chakrado [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu