Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Model-Based Meta Automatic Curriculum Learning

Model-Based Meta Automatic Curriculum Learning.
Zifan Xu, Yulin Zhang, Shahaf S. Shperberg, Reuth Mirsky, Yulin Zhan, Yuqian Jiang, Bo Liu, and Peter Stone.
In ICML workshop on Decision Awareness in Reinforcement Learning (DARL), July 2022.
recorded presentation

Download

[PDF]880.3kB  

Abstract

When an agent trains for one target task, its experience is expected to be useful for training on another target task. This paper formulates the meta curriculum learning problem that builds a sequence of intermediate training tasks, called a curriculum, which will assist the learner to train toward any given target task in general. We propose a model-based meta automatic curriculum learning algorithm (MM-ACL) that learns to predict the performance improvement on one task when the policy is trained on another, given contextual information such as the history of training tasks, loss functions, rollout state-action trajectories from the policy, etc. This predictor facilitates the generation of curricula that optimizes the performance of the learner on different target tasks. Our empirical results demonstrate that MM-ACL outperforms a random curriculum, a manually created curriculum, and a commonly used non-stationary bandit algorithm in a GridWorld domain.

BibTeX Entry

@InProceedings{DARL22-ZIFAN,
  author = {Zifan Xu and Yulin Zhang and Shahaf S. Shperberg and Reuth Mirsky and Yulin Zhan and Yuqian Jiang and Bo Liu and Peter Stone},
  title = {Model-Based Meta Automatic Curriculum Learning},
  booktitle = {ICML workshop on Decision Awareness in Reinforcement Learning (DARL)},
  location = {Baltimore, Maryland, USA},
  month = {July},
  year = {2022},
  abstract = {  
When an agent trains for one target task, its experience is expected to be useful for training on another target task. This paper formulates the meta curriculum learning problem that builds a sequence of intermediate training tasks, called a curriculum, which will assist the learner to train toward any given target task in general. We propose a model-based meta automatic curriculum learning algorithm (MM-ACL) that learns to predict the performance improvement on one task when the policy is trained on another, given contextual information such as the history of training tasks, loss functions, rollout state-action trajectories from the policy, etc. This predictor facilitates the generation of curricula that optimizes the performance of the learner on different target tasks. Our empirical results demonstrate that MM-ACL outperforms a random curriculum, a manually created curriculum, and a commonly used non-stationary bandit algorithm in a GridWorld domain.
  },
  wwwnote={<a href="https://slideslive.com/38987390">recorded presentation</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Feb 22, 2024 23:59:29