• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
Vision-based Manipulation from Single Human Video with Open-World Object Graphs.
Yifeng
Zhu, Arisrei Lim, Peter Stone, and Yuke
Zhu .
Autonomous Robots, 50(27), 2026.
This work presents an object-centric approach to learning vision-basedmanipulation skills from human videos. We investigate the problem of robotmanipulation via imitation in the open-world setting, where a robot learns tomanipulate novel objects from a single video demonstration. We introduce ORION,an algorithm that tackles the problem by extracting an object-centricmanipulation plan as an Open-World Object Graph from a single RGB or RGB-D video,and then deriving a policy that conditions on the resulting plan. Our methodenables the robot to learn from videos captured by daily mobile devices andgeneralize to deployment environments with varying visual backgrounds, cameraangles, spatial layouts, and novel object instances. We systematically evaluateour method on both short-horizon and long-horizon tasks, using RGB-D and RGB-onlydemonstration videos. Across our real-world evaluations on varied tasks anddemonstration modalities (RGB-D / RGB), we observe an average success rate of74.4 percent, demonstrating the efficacy of ORION in learning from a single human videoin the open world. Additional materials can be found on the project website.
@Article{zhu_vision_based_manipulation_auro_2026,
author = {Yifeng Zhu and Arisrei Lim and Peter Stone and Yuke Zhu },
title = {Vision-based Manipulation from Single Human Video with Open-World Object Graphs},
journal = { Autonomous Robots},
volume="50",
number="27",
year = {2026},
abstract = {This work presents an object-centric approach to learning vision-based
manipulation skills from human videos. We investigate the problem of robot
manipulation via imitation in the open-world setting, where a robot learns to
manipulate novel objects from a single video demonstration. We introduce ORION,
an algorithm that tackles the problem by extracting an object-centric
manipulation plan as an Open-World Object Graph from a single RGB or RGB-D video,
and then deriving a policy that conditions on the resulting plan. Our method
enables the robot to learn from videos captured by daily mobile devices and
generalize to deployment environments with varying visual backgrounds, camera
angles, spatial layouts, and novel object instances. We systematically evaluate
our method on both short-horizon and long-horizon tasks, using RGB-D and RGB-only
demonstration videos. Across our real-world evaluations on varied tasks and
demonstration modalities (RGB-D / RGB), we observe an average success rate of
74.4 percent, demonstrating the efficacy of ORION in learning from a single human video
in the open world. Additional materials can be found on the project website.
},
}
Generated by bib2html.pl (written by Patrick Riley ) on Mon May 18, 2026 15:31:18