Half Field Offense in RoboCup Soccer: A Problem for Multiagent Reinforcement Learning

(See the UT Austin Villa page for other robot-soccer-related projects from this lab.)

Half Field Offense is a subtask in RoboCup simulated soccer, modeling a situation in which the offense of one team has to get past the defense of the opposition in order to shoot goals. Half Field Offense is an extension of Keepaway, a simpler task that has been studied in the past. We pose Half Field Offense as a problem for reinforcement learning, and present a solution that improves the method used for learning Keepaway. Complete details are provided in our paper:

  • "Half Field Offense in RoboCup Soccer: A Multiagent Reinforcement Learning Case Study." (ps | pdf)
    Shivaram Kalyanakrishnan, Yaxin Liu, and Peter Stone.
    RoboCup International Symposium 2006.

  • Task Description

    In Half Field Offense, an offense team of m players has to outsmart a defense team of n players, including a goalie, to score a goal. The task is played over one half of the soccer field, and begins near the half field line, with the ball close to one of the offense players. The offense team tries to maintain possession, move up the field, and score. The defense team tries to take the ball away from the offense team.

    4v5 Half Field Offense Screenshot.

    The task is episodic, and an episode ends when one of three events occurs:

  • A goal is scored,
  • The ball is out of bounds, or
  • A defender gets possession of the ball (including the goalie catching the ball).
  • The figure to the right shows a screen-shot from a half field offense task. The objective of learning in this task is to increase the goal-scoring performance of the offense team. The offense team player who has possession of the ball (denoted O1) has to execute on of the following actions:
  • Passk: A direct kick to the teammate that is the k-th closest to the ball, where k=2,3,. . .,m.
  • Dribble: A kick that is generally in the direction of the goal, but at the same time avoids opposition playerr who are close.
  • Shoot: A kick towards the goal in the direction bisecting the widest open angle available between the goalie, other defenders, and the goalposts.
  • Offense players other than than O1 follow a fixed static policy. The offense players' behavior is described below:

    Offense Behavior.

    The defense team also follows a fixed policy. The learning problem then is simply to decide which action O1 must take in any given state. A detailed description of the state representation is presented in our papers.

    4v5 Half Field Offense

    4v5 half field offense (depicted in the figure) is a version of the task involving four offense players and five defense players. Here we describe it in detail. The videos and the results presented subsequently all relate to 4v5 half field offense.

    Episode Start: Each episode of the task begins with the offense players occupying the vertices of a square with a side of 20m. One side of the square always adjoins the half field line, but its position is varied randomly between the side lines. The assignment of the players to the corners of this square is also done randomly every episode, so that they all learn general and roughly homogeneous behavior. At the start of the episode, two defense players are stationed in the middle of the square described by the offense players, while two others and the goalie begin inside the penalty area.

    GetOpen: Offense players other than the one closest to the ball keep in formation according to the GetOpen function. In our implementation, the players simply stick to a trapezoidal formation, with one player at each vertex of the trapezoid. The parallel sides of the trapezoid are 10m apart, and are themselves parallel to the half field line. The side closer to the goal is 16m long, while the one closer to the half field line is 12m long. As the player with the ball dribbles forward or passes the ball, the other players continually move to the position prescribed by the trapezoid.

    Defense Team Policy: The defense team follows a static policy. One defense player actively tries to attack O1, while another seeks to mark the player who is most open to receive a pass from O1. The other two defense players stay between the goal and the two offense players closest to the goal line. The goalie, who can execute a ``catch'' action, can typically block shots that are within 2m of its reach on either side. For the actual implementation of the 4v5 half field offense task, we used the the code base made available by the UvA 2003 RoboCup Soccer Team


    Example Policies and Videos

    The solution that the offense players have to learn for the half field offense task is a policy, which maps state variables to actions. In the 4v5 version of the task, the players have to learn which of five actions to take in any given state, which is itself given by a 17-dimensional feature vector. Below we list some policies for 4v5 half field offense that we compare with the learned policy. Videos of the task following these policies were generated using the ``robocup2flash'' tool created by Thilo Girmann. They are in ``.swf'' format, and your browser needs to have the Macromedia Flash plug-in in order to play them. Please press the red ``play'' button (to the top of the soccer field) to start the videos.
  • Random Policy: Under this policy, the offense player chooses randomly from the action set {Pass2, Pass3, Pass4, Dribble, Shoot} from any state.
  • Handcoded Policy: This is a policy we have manually engineered to register good performance on the 4v5 half field offense task. Here we also provide its complete description.
  • Policy learned using algorithm without communication (similar to one used for keepaway), after 20,000 episodes of training.
  • Policy learned using our algorithm with communication after 20,000 episodes of training. This policy registers the best performance.
  • UvA 2003 Offense Policy (Note that the offense players here do not adhere to the 4v5 half field offense behavior specifications.).