Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Augmenting Reinforcement Learning with Human Feedback

W. Bradley Knox and Peter Stone. Augmenting Reinforcement Learning with Human Feedback. In ICML 2011 Workshop on New Developments in Imitation Learning, July 2011.
ICML 2011 Workshop on New Developments in Imitation Learning

Download

[PDF]333.3kB [postscript]2.5MB

Abstract

As computational agents are increasingly used beyond research labs, their success will depend on their ability to learn new skills and adapt to their dynamic, complex environments. If human users --- without programming skills --- can transfer their task knowledge to agents, learning can accelerate dramatically, reducing costly trials. The TAMER framework guides the design of agents whose behavior can be shaped through signals of approval and disapproval, a natural form of human feedback. More recently, TAMER+RL was introduced to enable human feedback to augment a traditional reinforcement learning (RL) agent that learns from a Markov decision process's (MDP) reward signal. Using a reimplementation of TAMER and TAMER+RL, we address limitations of prior work, contributing in two critical directions. First, the four successful techniques for combining a human reinforcement with RL from prior TAMER+RL work are tested on a second task, and these techniques' sensitivities to parameter changes are analyzed. Together, these examinations yield more general and prescriptive conclusions to guide others who wish to incorporate human knowledge into an RL algorithm. Second, TAMER+RL has thus far been limited to a sequential setting, in which training occurs before learning from MDP reward. We modify the sequential algorithms to learn simultaneously from both sources, enabling the human feedback to come at any time during the reinforcement learning process. To enable simultaneous learning, we introduce a new technique that appropriately determines the magnitude of the human model's influence on the RL algorithm throughout time and state-action space.

BibTeX Entry

@InProceedings{ICML_IL11-knox,
  author="W.\ Bradley Knox and Peter Stone",
  title=" Augmenting Reinforcement Learning with Human Feedback",
  booktitle="ICML 2011 Workshop on New Developments in Imitation Learning",
  month="July",
  year="2011",
  abstract={As computational agents are increasingly used beyond
      research labs, their success will depend on their ability to learn new
      skills and adapt to their dynamic, complex environments. If human
      users --- without programming skills --- can transfer their task
      knowledge to agents, learning can accelerate dramatically, reducing
      costly trials. The TAMER framework guides the design of agents whose
      behavior can be shaped through signals of approval and disapproval, a
      natural form of human feedback. More recently, TAMER+RL was introduced
      to enable human feedback to augment a traditional reinforcement
      learning (RL) agent that learns from a Markov decision process's (MDP)
      reward signal. Using a reimplementation of TAMER and TAMER+RL, we
      address limitations of prior work, contributing in two critical
      directions. First, the four successful techniques for combining a
      human reinforcement with RL from prior TAMER+RL work are tested on a
      second task, and these techniques' sensitivities to parameter changes
      are analyzed. Together, these examinations yield more general and
      prescriptive conclusions to guide others who wish to incorporate human
      knowledge into an RL algorithm. Second, TAMER+RL has thus far been
      limited to a sequential setting, in which training occurs before
      learning from MDP reward. We modify the sequential algorithms to learn
      simultaneously from both sources, enabling the human feedback to come
      at any time during the reinforcement learning process.  To enable
      simultaneous learning, we introduce a new technique that appropriately
      determines the magnitude of the human model's influence on the RL
      algorithm throughout time and state-action space.},
  wwwnote={<a href="http://www.robot-learning.de/Research/ICML2011">ICML
    2011 Workshop on New Developments in Imitation Learning</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Feb 13, 2012 13:14:41