Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

Towards a Data Efficient Off-Policy Policy Gradient

Towards a Data Efficient Off-Policy Policy Gradient.
Josiah Hanna and Peter Stone.
In AAAI Spring Symposium on Data Efficient Reinforcement Learning, March 2018.

Download

[PDF]345.4kB

Abstract

The ability to learn from off-policy data -- data generated from past interaction with the environment -- is essential to data efficient reinforcement learning. Recent work has shown that the use of off-policy data not only allows the re-use of data but can even improve performance in comparison to on-policy reinforcement learning. In this work we investigate if a recently proposed method for learning a better data generation policy, commonly called a behavior policy, can also increase the data efficiency of policy gradient reinforcement learning. Empirical results demonstrate that with an appropriately selected behavior policy we can estimate the policy gradient more accurately. The results also motivate further work into developing methods for adapting the behavior policy as the policy we are learning changes.

BibTeX Entry

@InProceedings{AAAISSS2018-Hanna,
  author = {Josiah Hanna and Peter Stone},
  title = {Towards a Data Efficient Off-Policy Policy Gradient},
  booktitle = {AAAI Spring Symposium on Data Efficient Reinforcement Learning},
  location = {Palo Alto, CA},
  month = {March},
  year = {2018},
  abstract = {
The ability to learn from off-policy data -- data generated from past interaction with the environment -- is essential to data efficient reinforcement learning. Recent work has shown that the use of off-policy data not only allows the re-use of data but can even improve performance in comparison to on-policy reinforcement learning. In this work we investigate if a recently proposed method for learning a better data generation policy, commonly called a behavior policy, can also increase the data efficiency of policy gradient reinforcement learning. Empirical results demonstrate that with an appropriately selected behavior policy we can estimate the policy gradient more accurately. The results also motivate further work into developing methods for adapting the behavior policy as the policy we are learning changes. 
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Fri Feb 13, 2026 09:54:55