Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Learning and Using Models

Learning and Using Models.
Todd Hester and Peter Stone.
In Marco Wiering and Martijn van Otterlo, editors, Reinforcement Learning: State of the Art, Springer Verlag, Berlin, Germany, 2011.

Download

[PDF]474.7kB  [postscript]1.0MB  

Abstract

As opposed to model-free RL methods, which learn directly from experience in the domain, model-based methods learn a model of the transition and reward functions of the domain on-line and plan a policy using this model.Once the method has learned an accurate model, it can plan an optimal policy on this model without any further experience in the world.Therefore, when model-based methods are able to learn a good model quickly,they frequently have improved sample efficiency over model-free methods, which must continue taking actions in the world for values to propagate back to previous states. Another advantage of model-based methods is that they can usetheir models to plan multi-step exploration trajectories. In particular,many methods drive the agent to explore where there is uncertainty in the model,so as to learn the model as fast as possible.In this chapter, we survey some of the types of models used in model-based methods and ways of learningthem, as well as methods for planning on these models.In addition, we examine the typical architectures forcombining model learning and planning, which vary depending on whether thedesigner wants the algorithm to run on-line, in batch mode, or inreal-time. One of the main performance criteria for these algorithmsis sample complexity, or how many actions thealgorithm must take to learn. We examine the sample efficiency of a few methods,which are highly dependent on having intelligent exploration mechanisms. We survey some approaches to solving the exploration problem, including Bayesianmethods that maintain a belief distribution over possible models to explicitlymeasure uncertainty in the model. We show some empirical comparisons of various model-based and model-free methods on two example domains before concluding with a survey of currentresearch on scaling these methods up to larger domains withimproved sample and computational complexity.

BibTeX Entry

@inCollection{RLSOTA11,
    author =    {Todd Hester and Peter Stone},
    title =     {Learning and Using Models},
    booktitle = {Reinforcement Learning: State of the Art},
    editor =    {Marco Wiering and Martijn van Otterlo},
    year =      {2011},
    address =	 {Berlin, Germany},
    publisher =	 {Springer Verlag},
abstract = "As opposed to model-free RL methods, which learn directly from 
experience in the domain, model-based methods learn a model of the transition 
and reward functions of the domain on-line and plan a policy using this model.
Once the method has learned an accurate model, it can plan 
an optimal policy on this model without any further experience in the world.
Therefore, when model-based methods are able to learn a good model quickly,
they frequently have improved sample efficiency over model-free methods, 
which must continue taking actions in the world for values to 
propagate back to previous states. 
Another advantage of model-based methods is that they can use
their models to plan multi-step exploration trajectories. In particular,
many methods drive the agent to explore where there is uncertainty in the model,
so as to learn the model as fast as possible.
In this chapter, we 
survey some of the types of models used in model-based methods and ways of learning
them, as well as methods for planning on these models.
In addition, we examine the typical architectures for
combining model learning and planning, which vary depending on whether the
designer wants the algorithm to run on-line, in batch mode, or in
real-time. 
One of the main performance criteria for these algorithms
is sample complexity, or how many actions the
algorithm must take to learn. We examine the sample efficiency of a few methods,
which are highly dependent on having intelligent exploration mechanisms. We 
survey some approaches to solving the exploration problem, including Bayesian
methods that maintain a belief distribution over possible models to explicitly
measure uncertainty in the model. 
We show some empirical comparisons of various model-based and model-free 
methods on two example domains before concluding with a survey of current
research on scaling these methods up to larger domains with
improved sample and computational complexity.",
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 17, 2024 18:42:48