Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Intrinsically Motivated Model Learning for a Developing Curious Agent

Todd Hester and Peter Stone. Intrinsically Motivated Model Learning for a Developing Curious Agent. In AAMAS Adaptive Learning Agents (ALA) Workshop, June 2012.

Download

[PDF]231.9kB  [postscript]7.0MB  

Abstract

Reinforcement Learning (RL) agents could benefit society by learning tasks that require learning and adaptation. However, learning these tasks efficiently typically requires a well- engineered reward function. Intrinsic motivation can be used to drive an agent to learn useful models of domains with limited or no external reward function. The agent can later plan on its learned model to perform tasks in the domain if given a reward function. This paper presents the TEXPLORE with Variance-And-Novelty-Intrinsic-Rewards algorithm (TEXPLORE-VANIR), an intrinsically motivated model- based RL algorithm. The algorithm learns models of the transition dynamics of a domain using decision trees. It calculates two different intrinsic rewards from this model: one to explore where the model is uncertain, and one to acquire novel experiences that the model has not yet been trained on. This paper presents experiments demonstrating that the combination of these two intrinsic rewards enables the algorithm to learn an accurate model of a domain with no external rewards and that the learned model can be used afterward to perform tasks in the domain. While learning the model, the agent explores the domain in a developing and curious way, progressively learning more complex skills. In addition, the experiments show that combining the agent’s intrinsic rewards with external task rewards enables the agent to learn faster than using external rewards alone.

BibTeX Entry

@InProceedings{ALA12-hester,
  author="Todd Hester and Peter Stone",
  title="Intrinsically Motivated Model Learning for a Developing Curious Agent",
  booktitle = "AAMAS Adaptive Learning Agents (ALA) Workshop",
  location = "Valencia, Spain",
  month = "June",
  year = "2012",
  abstract = "Reinforcement Learning (RL) agents could benefit society
                  by learning tasks that require learning and
                  adaptation. However, learning these tasks
                  efficiently typically requires a well- engineered
                  reward function. Intrinsic motivation can be used to
                  drive an agent to learn useful models of domains
                  with limited or no external reward function. The
                  agent can later plan on its learned model to perform
                  tasks in the domain if given a reward function. This
                  paper presents the TEXPLORE with
                  Variance-And-Novelty-Intrinsic-Rewards algorithm
                  (TEXPLORE-VANIR), an intrinsically motivated model-
                  based RL algorithm. The algorithm learns models of
                  the transition dynamics of a domain using decision
                  trees. It calculates two different intrinsic rewards
                  from this model: one to explore where the model is
                  uncertain, and one to acquire novel experiences that
                  the model has not yet been trained on. This paper
                  presents experiments demonstrating that the
                  combination of these two intrinsic rewards enables
                  the algorithm to learn an accurate model of a domain
                  with no external rewards and that the learned model
                  can be used afterward to perform tasks in the
                  domain. While learning the model, the agent explores
                  the domain in a developing and curious way,
                  progressively learning more complex skills. In
                  addition, the experiments show that combining the
                  agent’s intrinsic rewards with external task rewards
                  enables the agent to learn faster than using
                  external rewards alone.",
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Sep 27, 2012 05:34:00