Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


LLM-GROP: Visually grounded robot task and motion planning with large language models

LLM-GROP: Visually grounded robot task and motion planning with large language models.
Xiaohan Zhang, Yan Ding, Yohei Hayamizu, Zainab Altaweel, Yifeng Zhu, Yuke Zhu, Peter Stone, Chris Paxton, and Shiqi Zhang.
The International Journal of Robotics Research, 2025.

Download

[PDF]2.5MB  

Abstract

Task planning and motion planning are two of the most important problems inrobotics, where task planning methods help robots achieve high-level goals andmotion planning methods maintain low-level feasibility. Task and motion planning(TAMP) methods interleave the two processes of task planning and motion planningto ensure goal achievement and motion feasibility. Within the TAMP context, weare concerned with the mobile manipulation (MoMa) of multiple objects, where itis necessary to interleave actions for navigation and manipulation. Inparticular, we aim to compute where and how each object should be placed givenunderspecified goals, such as “set up dinner table with a fork, knife and plate.”We leverage the rich common sense knowledge from large language models (LLMs),for example, about how tableware is organized, to facilitate both task-level andmotion-level planning. In addition, we use computer vision methods to learn astrategy for selecting base positions to facilitate MoMa behaviors, where thebase position corresponds to the robot’s “footprint” and orientation in itsoperating space. Altogether, this article provides a principled TAMP frameworkfor MoMa tasks that accounts for common sense about object rearrangement and isadaptive to novel situations that include many objects that need to be moved. Weperformed quantitative experiments in both real-world settings and simulatedenvironments. We evaluated the success rate and efficiency in completinglong-horizon object rearrangement tasks. While the robot completed 84.4 percentreal-world object rearrangement trials, subjective human evaluations indicatedthat the robot’s performance is still lower than experienced human waiters.

BibTeX Entry

@Article{xiaohan_ijrr2025,
  author   = {Xiaohan Zhang and Yan Ding and Yohei Hayamizu and Zainab Altaweel and Yifeng Zhu and Yuke Zhu and Peter Stone and Chris Paxton and Shiqi Zhang},
  title    = {LLM-GROP: Visually grounded robot task and motion planning with large language models},
  journal = {The International Journal of Robotics Research},
  year     = {2025},
  abstract = {Task planning and motion planning are two of the most important problems in
robotics, where task planning methods help robots achieve high-level goals and
motion planning methods maintain low-level feasibility. Task and motion planning
(TAMP) methods interleave the two processes of task planning and motion planning
to ensure goal achievement and motion feasibility. Within the TAMP context, we
are concerned with the mobile manipulation (MoMa) of multiple objects, where it
is necessary to interleave actions for navigation and manipulation. In
particular, we aim to compute where and how each object should be placed given
underspecified goals, such as “set up dinner table with a fork, knife and plate.”
We leverage the rich common sense knowledge from large language models (LLMs),
for example, about how tableware is organized, to facilitate both task-level and
motion-level planning. In addition, we use computer vision methods to learn a
strategy for selecting base positions to facilitate MoMa behaviors, where the
base position corresponds to the robot’s “footprint” and orientation in its
operating space. Altogether, this article provides a principled TAMP framework
for MoMa tasks that accounts for common sense about object rearrangement and is
adaptive to novel situations that include many objects that need to be moved. We
performed quantitative experiments in both real-world settings and simulated
environments. We evaluated the success rate and efficiency in completing
long-horizon object rearrangement tasks. While the robot completed 84.4 percent
real-world object rearrangement trials, subjective human evaluations indicated
that the robot’s performance is still lower than experienced human waiters.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Oct 02, 2025 22:46:24