Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning.
Jiaheng Hu, Jay Shim, Chen Tang, Yoonchang Sung, Bo Liu, Peter Stone, and Roberto Martin-Martin.
In Reinforcement Learning Journal, August 2026.

Download

[PDF]7.9MB  

Abstract

Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across diverse lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.

BibTeX Entry

@InProceedings{hu_crlvla2026,
  author   = {Jiaheng Hu and Jay Shim and Chen Tang and Yoonchang Sung and Bo Liu and Peter Stone and Roberto Martin-Martin},
  title    = {Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning},
  booktitle = {Reinforcement Learning Journal},
  year     = {2026},
  month    = {August},
  location = {Montreal, Canada},
  abstract = {Continual Reinforcement Learning (CRL) for Vision-Language-Action (VLA) models is a promising direction toward self-improving embodied agents that can adapt in openended, evolving environments. However, conventional wisdom from continual learning suggests that naive Sequential Fine-Tuning (Seq. FT) leads to catastrophic forgetting, necessitating complex CRL strategies. In this work, we take a step back and conduct a systematic study of CRL for large pretrained VLAs across diverse lifelong RL benchmarks. We find that, contrary to established belief, simple Seq. FT with low-rank adaptation (LoRA) is remarkably strong: it achieves high plasticity, exhibits little to no forgetting, and retains strong zero-shot generalization, frequently outperforming more sophisticated CRL methods. Through detailed analysis, we show that this robustness arises from a synergy between the large pretrained model, parameter-efficient adaptation, and on-policy RL. Together, these components reshape the stability-plasticity trade-off, making continual adaptation both stable and scalable. Our results position Sequential Fine-Tuning as a powerful method for continual RL with VLAs and provide new insights into lifelong learning in the large model era. Code is available at github.com/UT-Austin-RobIn/continual-vla-rl.},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Jun 10, 2026 15:26:44