TEXPLORE is a Reinforcement Learning (RL) algorithm, or a method for agents to learn to perform sequential decision making tasks through interaction with their environment. This research is focused on applying RL to more real-world problems, particularly learning on robots. Enabling robots to learn will make them generally more useful, as they will not require pre-programming for every task and environment.
In RL, an agent is in some state in the world (e.g. a particular chess board configuration or a location in a city) and has some set of actions it can take (e.g. chess moves, turns at an intersection). Upon taking an action, it reaches a new state, and receives a scalar reward (e.g. +1 for winning the chess game, -1 for losing, and 0 otherwise; or minus the time each road segment took). The goal of the agent is to learn which action to take in each state to maximize its reward over time.
There are a number of issues with applying RL to real-world problems such as robots. First, learning must happen with a limited number of actions. Methods that take thousands or millions of actions to learn are not feasible for a robot, as the robot is likely to break, wear out, run out of battery power, or overheat before that many actions can be taken. Second, learning must take place in real-time. We would like the RL agent to be in continual control of the robot, not controlling it for short periods of time followed by long pauses for it to compute what action it should take next. Finally, there are issues with handling the continuous state space of robots, and dealing with the delays many mechanical actuators have.
To address these issues, we have developed an algorithm called TEXPLORE. It is a model-based RL algorithm, which means it learns a model of the state transition and reward dynamics of the domain and then uses its model to plan a policy, enabling it to learn in fewer actions than many model-free approaches. It also utilizes a real-time architecture which performs the model learning and planning in parallel threads, so the agent can act in real-time.
I've released a ROS package with the TEXPLORE source code that can be easily applied to any robots running ROS.
A complete description of the TEXPLORE algorithm, with videos detailing each aspect of it, is available on my webpage.
Video: Learning to Score Penalty Kicks via Reinforcement Learning
The accompanying video for our ICRA 2010 paper, where our learning algorithm controls the robot, learning to score penalty kicks.