Generalized Model Learning for Reinforcement Learning on a Humanoid Robot (2010)
Reinforcement learning (RL) algorithms have long beenpromising methods for enabling an autonomous robot to improve itsbehavior on sequential decision-making tasks. The obvious enticement is that the robot should be able to improve its own behavior without theneed for detailed step-by-step programming. However, for RL to reach itsfull potential, the algorithms must be sample efficient: they must learncompetent behavior from very few real-world trials. From thisperspective, model-based methods, which use experiential data moreefficiently than model-free approaches, are appealing. But they oftenrequire exhaustive exploration to learn an accurate model of the domain. In this paper, we present an algorithm, Reinforcement Learning withDecision Trees (RL-DT), that uses decision trees to learn the model by generalizing the relative effect of actions across states. The agentexplores the environment until it believes it has a reasonable policy.The combination of the learning approach with the targeted explorationpolicy enables fast learning of the model. We compare RL-DT againststandard model-free and model-based learning methods, and demonstrate its effectiveness on an Aldebaran Nao humanoid robot scoring goals in apenalty kick scenario.
In International Conference on Robotics and Automation 2010.

Todd Hester Postdoctoral Alumni todd [at] cs utexas edu
Michael Quinlan Formerly affiliated Research Scientist mquinlan [at] cs utexas edu
Peter Stone Faculty pstone [at] cs utexas edu