• Classified by Topic • Classified by Publication Type • Sorted by Date • Sorted by First Author Last Name • Classified by Funding Source •
LLM-GROP: Visually grounded robot task and motion planning with large language models.
Xiaohan Zhang, Yan Ding,
Yohei Hayamizu, Zainab Altaweel, Yifeng Zhu, Yuke
Zhu, Peter Stone, Chris Paxton, and Shiqi
Zhang.
The International Journal of Robotics Research, 2025.
Task planning and motion planning are two of the most important problems inrobotics, where task planning methods help robots achieve high-level goals andmotion planning methods maintain low-level feasibility. Task and motion planning(TAMP) methods interleave the two processes of task planning and motion planningto ensure goal achievement and motion feasibility. Within the TAMP context, weare concerned with the mobile manipulation (MoMa) of multiple objects, where itis necessary to interleave actions for navigation and manipulation. Inparticular, we aim to compute where and how each object should be placed givenunderspecified goals, such as “set up dinner table with a fork, knife and plate.”We leverage the rich common sense knowledge from large language models (LLMs),for example, about how tableware is organized, to facilitate both task-level andmotion-level planning. In addition, we use computer vision methods to learn astrategy for selecting base positions to facilitate MoMa behaviors, where thebase position corresponds to the robot’s “footprint” and orientation in itsoperating space. Altogether, this article provides a principled TAMP frameworkfor MoMa tasks that accounts for common sense about object rearrangement and isadaptive to novel situations that include many objects that need to be moved. Weperformed quantitative experiments in both real-world settings and simulatedenvironments. We evaluated the success rate and efficiency in completinglong-horizon object rearrangement tasks. While the robot completed 84.4 percentreal-world object rearrangement trials, subjective human evaluations indicatedthat the robot’s performance is still lower than experienced human waiters.
@Article{xiaohan_ijrr2025, author = {Xiaohan Zhang and Yan Ding and Yohei Hayamizu and Zainab Altaweel and Yifeng Zhu and Yuke Zhu and Peter Stone and Chris Paxton and Shiqi Zhang}, title = {LLM-GROP: Visually grounded robot task and motion planning with large language models}, journal = {The International Journal of Robotics Research}, year = {2025}, abstract = {Task planning and motion planning are two of the most important problems in robotics, where task planning methods help robots achieve high-level goals and motion planning methods maintain low-level feasibility. Task and motion planning (TAMP) methods interleave the two processes of task planning and motion planning to ensure goal achievement and motion feasibility. Within the TAMP context, we are concerned with the mobile manipulation (MoMa) of multiple objects, where it is necessary to interleave actions for navigation and manipulation. In particular, we aim to compute where and how each object should be placed given underspecified goals, such as âset up dinner table with a fork, knife and plate.â We leverage the rich common sense knowledge from large language models (LLMs), for example, about how tableware is organized, to facilitate both task-level and motion-level planning. In addition, we use computer vision methods to learn a strategy for selecting base positions to facilitate MoMa behaviors, where the base position corresponds to the robotâs âfootprintâ and orientation in its operating space. Altogether, this article provides a principled TAMP framework for MoMa tasks that accounts for common sense about object rearrangement and is adaptive to novel situations that include many objects that need to be moved. We performed quantitative experiments in both real-world settings and simulated environments. We evaluated the success rate and efficiency in completing long-horizon object rearrangement tasks. While the robot completed 84.4 percent real-world object rearrangement trials, subjective human evaluations indicated that the robotâs performance is still lower than experienced human waiters. }, }
Generated by bib2html.pl (written by Patrick Riley ) on Thu Oct 02, 2025 22:46:24