UTCS Artificial Intelligence
courses
talks/events
demos
people
projects
publications
software/data
labs
areas
admin
On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning (2016)
Matthew Hausknecht and
Peter Stone
Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability compared to using exclusively one or the other. The same technique applied to DQN in a discrete action space drastically slows down learning. Our findings raise questions about the nature of on-policy and off-policy bootstrap and Monte Carlo updates and their relationship to deep reinforcement learning methods.
View:
PDF
,
HTML
Citation:
In
Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop
, New York, July 2016.
Bibtex:
@inproceedings{DeepRL16-hausknecht, title={On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning}, author={Matthew Hausknecht and Peter Stone}, booktitle={Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop}, month={July}, address={New York}, url="http://www.cs.utexas.edu/users/ai-lab?hausknecht:deeprl16", year={2016} }
People
Peter Stone
Faculty
pstone [at] cs utexas edu
Areas of Interest
Deep Learning
Reinforcement Learning
Labs
Learning Agents