Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Reinforcement Learning from Human Reward: Discounting in Episodic Tasks

W. Bradley Knox and Peter Stone. Reinforcement Learning from Human Reward: Discounting in Episodic Tasks. In In Proceedings of the 21st IEEE International Symposium on Robot and Human Interactive Communication (Ro-Man), September 2012.
CoTeSys Cognitive Robotics BEST PAPER AWARD FINALIST.
Ro-Man 2012

Download

[PDF]2.1MB  

Abstract

Several studies have demonstrated that teaching agents by human-generated reward can be a powerful technique. However, the algorithmic space for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward in goal-based, episodic tasks, we investigate how anticipated future rewards should be discounted to create behavior that performs well on the task that the human trainer intends to teach. We identify a “positive circuits” problem with low discounting (i.e., high discount factors) that arises from an observed bias among humans towards giving positive reward. Empirical analyses indicate that high discounting (i.e., low discount factors) of human reward is necessary in goal-based, episodic tasks and lend credence to the existence of the positive circuits problem.

BibTeX Entry

@InProceedings{ROMAN12-knox,
  author = {W. Bradley Knox and Peter Stone},
  title = {Reinforcement Learning from Human Reward: Discounting in Episodic Tasks},
  booktitle = {In Proceedings of the 21st IEEE International Symposium on Robot and Human Interactive Communication (Ro-Man)},
  location = {Paris, France},
  month = {September},
  year = {2012},
  abstract = {
Several studies have demonstrated that teaching agents by human-generated reward can be a powerful technique. However, the algorithmic space for learning from human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward in goal-based, episodic tasks, we investigate how anticipated future rewards should be discounted to create behavior that performs well on the task that the human trainer intends to teach. We identify a “positive circuits” problem with low discounting (i.e., high discount factors) that arises from an observed bias among humans towards giving positive reward. Empirical analyses indicate that high discounting (i.e., low discount factors) of human reward is necessary in goal-based, episodic tasks and lend credence to the existence of the positive circuits problem.
  },
  wwwnote={CoTeSys Cognitive Robotics <b>BEST PAPER AWARD FINALIST</b>.<br>
<a href="http://www.ro-man2012.org/">Ro-Man 2012</a>
}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Sep 18, 2018 13:08:35