Half Field Offense in RoboCup Soccer: A Problem for Multiagent Reinforcement Learning
(See the UT Austin Villa page for other robot-soccer-related projects from this lab.)
Half Field Offense is a subtask in RoboCup simulated soccer,
modeling a situation in which the offense of one team has to get past the defense
of the opposition in order to shoot goals. Half Field Offense is an extension
of Keepaway, a simpler task that has been studied in the past. We pose
Half Field Offense as a problem for reinforcement learning, and present
a solution that improves the method used for learning Keepaway. Complete
details are provided in our paper:
Task Description
In Half Field Offense, an offense team of m players has to outsmart a defense
team of n players, including a goalie, to score a goal. The task is played over
one half of the soccer field, and begins near the half field line, with the ball close to one of the
offense players. The offense team tries to maintain possession, move up the field, and
score. The defense team tries to take the ball away from the offense
team.
The task is episodic, and an episode ends when one of three events occurs:
The figure to the right shows a screen-shot from a half field offense task.
The objective of learning in this task is
to increase the goal-scoring performance of the offense team. The offense
team player who has possession of the ball (denoted O1) has to execute
on of the following actions:
Offense players other than than O1 follow a fixed static policy. The offense
players' behavior is described below:
The defense team also follows a fixed policy. The learning problem then is simply
to decide which action O1 must take in any given state. A detailed
description of the state representation is presented in our papers.
4v5 Half Field Offense
4v5 half field offense (depicted in the figure) is a version of the task
involving four offense players and five defense players. Here we describe
it in detail. The videos and the results presented subsequently all relate
to 4v5 half field offense.
Episode Start: Each episode of the task begins with the offense players occupying the
vertices of a square with a side of 20m. One side of the square
always adjoins the half field line, but its position is varied
randomly between the side lines. The assignment of the players to the
corners of this square is also done randomly every episode, so that
they all learn general and roughly homogeneous behavior. At the start
of the episode, two defense players are stationed in the middle of the
square described by the offense players, while two others and the
goalie begin inside the penalty area.
GetOpen: Offense players other than the one closest to the ball
keep in formation according to the GetOpen function. In our
implementation, the players simply stick to a trapezoidal formation,
with one player at each vertex of the trapezoid. The parallel sides
of the trapezoid are 10m apart, and are themselves parallel to the
half field line. The side closer to the goal is 16m long, while the
one closer to the half field line is 12m long. As the player with the
ball dribbles forward or passes the ball, the other players
continually move to the position prescribed by the trapezoid.
Defense Team Policy: The defense team follows a static policy. One defense player actively
tries to attack O1, while another seeks to mark the player who is
most open to receive a pass from O1. The other two defense
players stay between the goal and the two offense players closest to
the goal line. The goalie, who can execute a ``catch'' action, can
typically block shots that are within 2m of its reach on either
side. For the actual implementation of the 4v5 half field offense
task, we used the the code base made available by the
UvA 2003 RoboCup Soccer Team
Example Policies and Videos
The solution that the offense players have to learn for the half
field offense task is a policy, which maps state variables
to actions. In the 4v5 version of the task, the players have to learn
which of five actions to take in any given state, which is itself
given by a 17-dimensional feature vector.
Below we list some policies for 4v5 half field offense that we compare
with the learned policy. Videos of the task following these policies
were generated using the ``robocup2flash'' tool created by Thilo Girmann.
They are in ``.swf'' format, and your browser needs to have the
Macromedia Flash plug-in in
order to play them. Please press the red ``play'' button (to the top of the soccer field)
to start the videos.