Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


TD Learning with Constrained Gradients

TD Learning with Constrained Gradients.
Ishan Durugkar and Peter Stone.
In Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017, December 2017.

Download

[PDF]381.4kB  

Abstract

Temporal Difference Learning with function approximation is known to be unstable. Previous work like GTD and GTD2 has presented alternative objectives that are stable to minimize for policy evaluation. However, for control, TD-learning with neural networks requires various tricks such as using a target network that updates slowly (DQN). In this work we propose a constraint on the TD update that minimizes change to the target values. This constraint can be applied to the gradients of any TD objective, and can be easily applied to nonlinear function approximation. We validate this update by applying our technique to deep Q-learning, and training without a target network. We also show that adding this constraint on Baird's counterexample keeps Constrained TD-learning from diverging.

BibTeX Entry

@InProceedings{NIPS17-ishand,
  author = {Ishan Durugkar and Peter Stone},
  title = {TD Learning with Constrained Gradients},
  booktitle = {Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017},
  location = {Long Beach, CA, USA},
  month = {December},
  year = {2017},
  abstract = {
  Temporal Difference Learning with function approximation is known to be
  unstable. Previous work like GTD and
  GTD2 has presented alternative objectives that are
  stable to minimize for policy evaluation. However, for control, TD-learning
  with neural networks requires various tricks such as using a target network
  that updates slowly (DQN). In this work we propose a
  constraint on the TD update that minimizes change to the target values. This
  constraint can be applied to the gradients of any TD objective, and can be
  easily applied to nonlinear function approximation. We validate this update by
  applying our technique to deep Q-learning, and training without a target
  network. We also show that adding this constraint on Baird's counterexample
  keeps Constrained TD-learning from diverging.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Mar 25, 2024 00:05:18