On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning (2016)

Matthew Hausknecht and Peter Stone

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability compared to using exclusively one or the other. The same technique applied to DQN in a discrete action space drastically slows down learning. Our findings raise questions about the nature of on-policy and off-policy bootstrap and Monte Carlo updates and their relationship to deep reinforcement learning methods.

View:

PDF, HTML

Citation:

In Deep Reinforcement Learning: Frontiers and Challenges, IJCAI Workshop, New York, July 2016.

Bibtex:

People

Peter Stone

Faculty

pstone [at] cs utexas edu

Areas of Interest

Deep Learning Reinforcement Learning

Labs

Learning Agents