Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning.
Matthew Hausknecht and Peter Stone.
In Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop, July 2016.

Download

[PDF]2.5MB

Abstract

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability compared to using exclusively one or the other. The same technique applied to DQN in a discrete action space drastically slows down learning. Our findings raise questions about the nature of on-policy and off-policy bootstrap and Monte Carlo updates and their relationship to deep reinforcement learning methods.

BibTeX Entry

@InProceedings{DeepRL16-hausknecht,
  author = {Matthew Hausknecht and Peter Stone},
  title = {On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning},
  booktitle = {Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop},
  location = {New York},
  month = {July},
  year = {2016},
  abstract = {Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability compared to using exclusively one or the other. The same technique applied to DQN in a discrete action space drastically slows down learning. Our findings raise questions about the nature of on-policy and off-policy bootstrap and Monte Carlo updates and their relationship to deep reinforcement learning methods.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Jul 21, 2026 11:48:15