TEXPLORE: Real-Time Sample Efficient Reinforcement Learning (2012)
Author: Todd Hester

TEXPLORE is a Reinforcement Learning (RL) algorithm, or a method for agents to learn to perform sequential decision making tasks through interaction with their environment. This research is focused on applying RL to more real-world problems, particularly learning on robots. Enabling robots to learn will make them generally more useful, as they will not require pre-programming for every task and environment.

In RL, an agent is in some state in the world (e.g. a particular chess board configuration or a location in a city) and has some set of actions it can take (e.g. chess moves, turns at an intersection). Upon taking an action, it reaches a new state, and receives a scalar reward (e.g. +1 for winning the chess game, -1 for losing, and 0 otherwise; or minus the time each road segment took). The goal of the agent is to learn which action to take in each state to maximize its reward over time.

There are a number of issues with applying RL to real-world problems such as robots. First, learning must happen with a limited number of actions. Methods that take thousands or millions of actions to learn are not feasible for a robot, as the robot is likely to break, wear out, run out of battery power, or overheat before that many actions can be taken. Second, learning must take place in real-time. We would like the RL agent to be in continual control of the robot, not controlling it for short periods of time followed by long pauses for it to compute what action it should take next. Finally, there are issues with handling the continuous state space of robots, and dealing with the delays many mechanical actuators have.

To address these issues, we have developed an algorithm called TEXPLORE. It is a model-based RL algorithm, which means it learns a model of the state transition and reward dynamics of the domain and then uses its model to plan a policy, enabling it to learn in fewer actions than many model-free approaches. It also utilizes a real-time architecture which performs the model learning and planning in parallel threads, so the agent can act in real-time. I've released a ROS package with the TEXPLORE source code that can be easily applied to any robots running ROS.

A complete description of the TEXPLORE algorithm, with videos detailing each aspect of it, is available on my webpage.

Video: Learning to Score Penalty Kicks via Reinforcement Learning

The accompanying video for our ICRA 2010 paper, where our learning algorithm controls the robot, learning to score penalty kicks.

Todd Hester Postdoctoral Alumni todd [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu
Michael Quinlan Formerly affiliated Research Scientist mquinlan [at] cs utexas edu
Intrinsically Motivated Model Learning for a Developing Curious Agent 2012
Todd Hester and Peter Stone, In Eleventh International Conference on Autonomous Agents and Multiagent Systems - Adaptive Learning Agents Workshop (AAMAS - ALA), June 2012.
Intrinsically Motivated Model Learning for a Developing Curious Agent 2012
Todd Hester and Peter Stone, In The Eleventh International Conference on Development and Learning (ICDL), Nov 2012.
TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. 2012
Todd Hester, PhD Thesis, The University of Texas at Austin. Code available at: http://www.ros.org/wiki/rl-texplore-ros-pkg.
RTMBA: A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control 2012
Todd Hester, Michael Quinlan, and Peter Stone, In {IEEE} International Conference on Robotics and Automation (ICRA), May 2012.
TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots 2012
Todd Hester and Peter Stone, Machine Learning (2012).
A Real-Time Model-Based Reinforcement Learning Architecture for Robot Control 2011
Todd Hester, Michael Quinlan, and Peter Stone,
Generalized Model Learning for Reinforcement Learning on a Humanoid Robot 2010
Todd Hester, Michael Quinlan, and Peter Stone, In International Conference on Robotics and Automation 2010.
Real Time Targeted Exploration in Large Domains 2010
Todd Hester and Peter Stone, In Proceedings of the Ninth International Conference on Development and Learning (ICDL 2010), 2010 (Eds.), August 2010.
An Empirical Comparison of Abstraction in Models of Markov Decision Processes 2009
Todd Hester and Peter Stone, In Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning, June 2009.
Generalized Model Learning for Reinforcement Learning in Factored Domains 2009
Todd Hester and Peter Stone, In The Eighth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2009.