Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Learning Exploration Strategies in Model-Based Reinforcement Learning

Learning Exploration Strategies in Model-Based Reinforcement Learning.
Todd Hester, Manuel Lopes, and Peter Stone.
In The Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2013.

Download

[PDF]258.7kB [postscript]2.4MB

Abstract

Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called LEO for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select exploration strategies based on the rewards received when following them. We show empirically that this method performs well across a set of five domains. In contrast, for a given algorithm, no set of parameters is best across all domains. Our results demonstrate that the LEO algorithm successfully learns the best exploration strategies on-line, increasing the received reward over static parameterizations of exploration and reducing the need for hand-tuning exploration parameters.

BibTeX Entry

@InProceedings{AAMAS13-hester,
  author="Todd Hester and Manuel Lopes and Peter Stone",
  title="Learning Exploration Strategies in Model-Based Reinforcement Learning",
  booktitle = "The Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS)",
  location = "St. Paul, Minnesota",
  month = "May",
  year = "2013",
  abstract = "Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks. However, typically the user must hand-tune exploration parameters for each different domain and/or algorithm that they are using. In this work, we present an algorithm called LEO for learning these exploration strategies on-line. This algorithm makes use of bandit-type algorithms to adaptively select exploration strategies based on the rewards received when following them. We show empirically that this method performs well across a set of five domains. In contrast, for a given algorithm, no set of parameters is best across all domains. Our results demonstrate that the LEO algorithm successfully learns the best exploration strategies on-line, increasing the received reward over static parameterizations of exploration and reducing the need for hand-tuning exploration parameters.",
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Feb 03, 2026 18:01:39