Design and Optimization of an Omnidirectional Humanoid Walk: A Winning Approach at the RoboCup 2011 3D Simulation Competition

Patrick MacAlpine, Samuel Barrett, Daniel Urieli, Victor Vu, Peter Stone

Published in the Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12) in Toronto, Ontario, Canada, July 2012.

The full paper can be found here.

This page provides details on the optimization of walk parameters for an omnidirectional walk engine which was the key component in UT Austin Villa winning the 2011 RoboCup 3D simulation competition. Results from the competition, including videos of game action, are linked off the UT Austin Villa homepage. The remainder of this page focuses only on the learned walk.

For the 2011 RoboCup 3D simulation competition UT Austin Villa implemented and learned optimized parameters for an omnidirectional walk engine. This was a vast improvement over the 2010 team's fixed skills based walk.

In order to optimize the walk engine parameters the agent first learned parameters while measuring how far it could dribble a ball during a driveBallToGoal optimization task. Unfortunately dribbling a ball to the goal wasn't fully representative of the many situations encountered in an actual game and so the walk learned from this was not as stable as desired.

To better represent the many situations encountered in a game a new set of parameters was learned by having the agent move through an obstacle course consisting of a set series of target positions during a goToTarget optimization task. To increase the speed of the agent this same goToTarget optimization task was also used to learn a sprint parameter set, active when the agent is oriented within 15 degrees of its target, to be used together with the goToTarget parameter set. Finally an additional parameter set for walking around the ball, known as the positioning parameter set, was learned by having the agent start from various positions and dribble the ball toward the goal through the driveBallToGoal2 optimization task. As each new learned parameter set was optimized in conjunction with previously learned parameter sets, they can all be seamlessly transitioned between.

Sample videos of the walk during different stages of learning can be found below.

Initial Walk Parameters

Omnidirectional walk using initial (unoptimized) parameters that were tuned to work with the physical Nao robot. This agent lost to a team of agents with the 2010 fixed skills based walk by an average goal difference of -1.65 with a standard error of .11 across 100 games.
Download video: mp4

Partial DriveBallToGoal Walk Learned Parameters

Omnidirectional walk using partial (11 out of 14) parameters that were learned through the driveBallToGoal optimization. This agent beat a team of agents with the initial walk parameters by an average goal difference of 1.72 with a standard error of .11 across 100 games. It lost to a team of agents with the 2010 fixed skills based walk, however, by an average goal difference of -.28 with a standard error of .07 across 100 games.
Download video: mp4

DriveBallToGoal Optimization

DriveBallToGoal agent dribbling the ball toward the goal while executing the driveBallToGoal optimization task. The agent's fitness is measured by how far it can dribble the ball toward the goal in 30 seconds. This agent beat a team of agents with the initial walk parameters by an average goal difference of 5.54 with a standard error of .14 across 100 games. It also beat a team of agents with the 2010 fixed skills based walk by an average goal difference of 2.99 with a standard error of .12 across 100 games.
Download video: mp4

GoToTarget Optimization

Final agent navigating an obstacle course of targets it is told to move toward while executing the goToTarget optimization task. The agent's fitness is measured by how far/fast it can move toward each target (shown as a magenta dot on the field). It is penalized for any movement when told to stop and is also penalized if it falls over. This optimization was used to learn both the goToTarget and sprint parameter sets. Using the goToTarget parameter set improved the agent's performance such that it was able to beat a team of agents using the driveBallToGoal parameter set by an average goal difference of 2.04 with a standard error of .11 across 100 games. Adding the sprint parameter set increased the speed of the agent from .64 m/s to .71 m/s when timed walking forward for ten seconds starting from a complete stop. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint.
Download video: mp4

DriveBallToGoal2 Optimization

Final agent dribbling the ball toward the goal from multiple starting points while executing the driveBallToGoal2 optimization task. The agent's fitness is measured by how far it can dribble each ball in 15 seconds toward the goal and is penalized if it dribbles the ball backwards. At the end of every 15 seconds the agent performs a set series of movements to check its stability and is penalized if it falls over. The optimization is run in simulation time which is much faster than real time. This optimization was used to learn the positioning parameter set. Adding the positioning parameter set improved the agent's performance such that it was able to beat a team of agents using only the goToTarget and sprint parameter sets by an average goal difference of .15 with a standard error of .07 across 100 games. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, P (cyan) = positioning.
Download video: mp4

Final Walk Learned Parameters

Omnidirectional walk using final optimized parameters sets. The walk parameter set the agent is using is displayed above the agent: T (red) = goToTarget, S (yellow) = sprint, P (cyan) = positioning. This agent beat a team of agents with the initial walk parameters by an average goal difference of 8.84 with a standard error of .12 across 100 games. It also beat a team of agents with the 2010 fixed skills based walk by an average goal difference of 6.32 with a standard error of .13 across 100 games.
Download video: mp4

For any questions, please contact Patrick MacAlpine.