Keepaway Pass+GetOpen: Learning Complementary Multiagent Behaviors

This web page provides supplementary material to the following paper.

Learning Complementary Multiagent Behaviors: A Case Study
Shivaram Kalyanakrishnan and Peter Stone
In Proceedings of the RoboCup International Symposium 2009. To appear.
PS, PDF, Abstract and BibTeX
A similiar version of this paper was accepted at the Adaptive and Learning Agents Workshop at AAMAS 2009, Budapest, Hungary. A short version is to appear in Proceedings of the Eighth International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009).

In this work, we use Keepaway as a case study for learning multiagent behaviors. In Keepaway, a team of keepers (in red) strives to maintain possession of the ball inside a rectangular region, without losing it to the opposing team of takers (in blue). The keepers' objective is to maximize the duration of the episode. We decompose Keepaway into two distinct components: PASS, which describes the behavior of the keeper with the ball in deciding to hold the ball or pass it to some teammate, and GETOPEN, which describes the behavior of its teammates in moving to promising positions on the field for receiving passes.

Our paper considers various policies (RANDOM, HAND-CODED, and LEARNED) for PASS and GETOPEN. This web page provides videos of the Keepaway task being executed while the keepers follow these variants of PASS and GETOPEN. Each execution corresponds to a pairing of some PASS and some GETOPEN policy. In order to facilitate visual comparison, videos that share a common PASS or GETOPEN policy are grouped together. Note that learning is switched off during these recordings.

In each video, the corresponding pairing is executed for 25 episodes. Due to the high stochasticity in the Keepaway task, the hold time achieved in each such recording does not represent the average performance of the policy, which is specified below the video. Each frame shows the current cycle number, as well as the total number of cycles. Each cycle lasts 100 milliseconds in simulated time. Policy names are abbreviated as in the paper (PASS as P, GETOPEN as GO, RANDOM as R, HAND-CODED as HC, and LEARNED as L.). The videos are in swf (Flash) format, and require that your browser have a Flash plug-in. Please use the red buttons to play, pause, and step through the videos.