Peter Stone's Selected Publications

• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors.
Yifeng Zhu, Abhishek Joshi, Peter Stone, and Yuke Zhu.
In Proceedings of the 6th Conference on Robot Learning (CoRL 2022), December 2022.
Project page
Code

Download

[PDF]4.4MB

Abstract

We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8 percents in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making.

BibTeX Entry

@inproceedings{corl2022-zhu,
  title={VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors},
  author={Yifeng Zhu and Abhishek Joshi and Peter Stone and Yuke Zhu},
  booktitle={Proceedings of the 6th Conference on Robot Learning (CoRL 2022)},
  location = {Auckland, New Zealand},
  month={December},
  year={2022},
  doi={},
  abstract={We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by 45.8 percents in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making.},
  wwwnote={<a href="https://ut-austin-rpl.github.io/VIOLA" target="_blank">Project page</a><br><a href="https://github.com/UT-Austin-RPL/VIOLA">Code</a>}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue May 06, 2025 18:17:40