Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search

On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search.
Piyush Khandelwal, Elad Liebman, Scott Niekum, and Peter Stone.
In Proceedings of The 33rd International Conference on Machine Learning, pp. 1319–1328, June 2016.

Download

[PDF]1.3MB  [slides.pdf]1.7MB  

Abstract

Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.

BibTeX Entry

@inproceedings{ICML16-khandelwal,
  title={On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search},
  author={Piyush Khandelwal and Elad Liebman and Scott Niekum and Peter Stone},
  booktitle={Proceedings of The 33rd International Conference on Machine Learning},
  pages={1319--1328},
  location={New York City, NY, USA},
  month={June},
  year={2016},
  abstract = {Over the past decade, Monte Carlo Tree Search (MCTS) and 
specifically Upper Confidence Bound in Trees (UCT) have proven to be quite 
effective in large probabilistic planning domains.  In this paper, we focus on 
how values are backpropagated in the MCTS tree, and apply complex return 
strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 
new MCTS variants. We demonstrate that in some probabilistic planning 
benchmarks from the International Planning Competition (IPC), selecting a 
MCTS variant with a backup strategy different from Monte Carlo averaging can 
lead to substantially better results. We also propose a hypothesis for why 
different backup strategies lead to different performance in particular 
environments, and manipulate a carefully structured grid-world domain to 
provide empirical evidence supporting our hypothesis.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 17, 2024 18:42:53