UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
TD Learning with Constrained Gradients (2017)
Ishan Durugkar
and
Peter Stone
Temporal Difference Learning with function approximation is known to be unstable. Previous work like GTD and GTD2 has presented alternative objectives that are stable to minimize for policy evaluation. However, for control, TD-learning with neural networks requires various tricks such as using a target network that updates slowly (DQN). In this work we propose a constraint on the TD update that minimizes change to the target values. This constraint can be applied to the gradients of any TD objective, and can be easily applied to nonlinear function approximation. We validate this update by applying our technique to deep Q-learning, and training without a target network. We also show that adding this constraint on Baird's counterexample keeps Constrained TD-learning from diverging.
View:
PDF
,
HTML
Citation:
In
Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017
, Long Beach, CA, USA, December 2017.
Bibtex:
@inproceedings{NIPS17-ishand, title={TD Learning with Constrained Gradients}, author={Ishan Durugkar and Peter Stone}, booktitle={Proceedings of the Deep Reinforcement Learning Symposium, NIPS 2017}, month={December}, address={Long Beach, CA, USA}, url="http://www.cs.utexas.edu/users/ai-lab?NIPS17-ishand", year={2017} }
People
Ishan Durugkar
Ph.D. Student
ishand [at] cs utexas edu
Peter Stone
Faculty
pstone [at] cs utexas edu
Areas of Interest
Reinforcement Learning
Labs
Learning Agents