Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

A synthesis of automated planning and reinforcement learning for efficient, robust decision-making

A synthesis of automated planning and reinforcement learning for efficient, robust decision-making.
Matteo Leonetti, Luca Iocchi, and Peter Stone.
Artificial Intelligence, 241:103 – 130, September 2016.

Download

[PDF]3.2MB

Abstract

Abstract Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.

BibTeX Entry

@article{AIJ16-leonetti,
title = "A synthesis of automated planning and reinforcement learning for 
efficient, robust decision-making",
journal = "Artificial Intelligence",
volume = "241",
pages = "103 - 130",
year = "2016",
month = "September",
issn = "0004-3702",
doi = "http://dx.doi.org/10.1016/j.artint.2016.07.004",
url = "http://www.sciencedirect.com/science/article/pii/S0004370216300819",
author = "Matteo Leonetti and Luca Iocchi and Peter Stone",
abstract = {Abstract Automated planning and reinforcement learning are 
characterized by complementary views on decision making: the former relies on 
previous knowledge and computation, while the latter on interaction with the 
world, and experience. Planning allows robots to carry out different tasks in 
the same domain, without the need to acquire knowledge about each one of them, 
but relies strongly on the accuracy of the model. Reinforcement learning, on the 
other hand, does not require previous knowledge, and allows robots to robustly 
adapt to the environment, but often necessitates an infeasible amount of 
experience. We present Domain Approximation for Reinforcement LearnING 
(DARLING), a method that takes advantage of planning to constrain the behavior 
of the agent to reasonable choices, and of reinforcement learning to adapt to 
the environment, and increase the reliability of the decision making process. We 
demonstrate the effectiveness of the proposed method on a service robot, 
carrying out a variety of tasks in an office building. We find that when the 
robot makes decisions by planning alone on a given model it often fails, and 
when it makes decisions by reinforcement learning alone it often cannot complete 
its tasks in a reasonable amount of time. When employing DARLING, even when 
seeded with the same model that was used for planning alone, however, the robot 
can quickly learn a behavior to carry out all the tasks, improves over time, and 
adapts to the environment as it changes.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Jun 20, 2025 08:27:15