Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion

ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion.
Zichao Hu, Chen Tang, Michael J. Munje, Yifeng Zhu, Alex Liu, Shuijing Liu, Garrett Warnell, Peter Stone, and Joydeep Biswas.
In Conference on Robot Learning, September 2025.

Download

[PDF]12.0MB  

Abstract

This paper considers the problem of enabling robots to navigate dynamicenvironments while following instructions. The challenge lies in thecombinatorial nature of instruction specifications: each instruction can includemultiple specifications, and the number of possible specification combinationsgrows exponentially as the robot’s skill set expands. For example, “overtake thepedestrian while staying on the right side of the road” consists of twospecifications: "overtake the pedestrian" and "walk on the right side of theroad." To tackle this challenge, we propose ComposableNav, based on the intuitionthat following an instruction involves independently satisfying its constituentspecifications, each corresponding to a distinct motion primitive. Usingdiffusion models, ComposableNav learns each primitive separately, then composesthem in parallel at deployment time to satisfy novel combinations ofspecifications unseen in training. Additionally, to avoid the onerous need fordemonstrations of individual motion primitives, we propose a two-stage trainingprocedure: (1) supervised pre-training to learn a base diffusion model fordynamic navigation, and (2) reinforcement learning fine-tuning that molds thebase model into different motion primitives. Through simulation and real-worldexperiments, we show that ComposableNav enables robots to follow instructions bygenerating trajectories that satisfy diverse and unseen combinations ofspecifications, significantly outperforming both non-compositional VLM-basedpolicies and costmap composing baselines.

BibTeX Entry

@InProceedings{zichao_hu_corl2025,
  author   = {Zichao Hu and Chen Tang and Michael J. Munje and Yifeng Zhu and Alex Liu and Shuijing Liu and Garrett Warnell and Peter Stone and Joydeep Biswas},
  title    = {ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion},
  booktitle = {Conference on Robot Learning},
  year     = {2025},
  month    = {September},
  location = {Seoul, Korea},
  abstract = {This paper considers the problem of enabling robots to navigate dynamic
environments while following instructions. The challenge lies in the
combinatorial nature of instruction specifications: each instruction can include
multiple specifications, and the number of possible specification combinations
grows exponentially as the robot’s skill set expands. For example, “overtake the
pedestrian while staying on the right side of the road” consists of two
specifications: "overtake the pedestrian" and "walk on the right side of the
road." To tackle this challenge, we propose ComposableNav, based on the intuition
that following an instruction involves independently satisfying its constituent
specifications, each corresponding to a distinct motion primitive. Using
diffusion models, ComposableNav learns each primitive separately, then composes
them in parallel at deployment time to satisfy novel combinations of
specifications unseen in training. Additionally, to avoid the onerous need for
demonstrations of individual motion primitives, we propose a two-stage training
procedure: (1) supervised pre-training to learn a base diffusion model for
dynamic navigation, and (2) reinforcement learning fine-tuning that molds the
base model into different motion primitives. Through simulation and real-world
experiments, we show that ComposableNav enables robots to follow instructions by
generating trajectories that satisfy diverse and unseen combinations of
specifications, significantly outperforming both non-compositional VLM-based
policies and costmap composing baselines.
  },
}

Generated by bib2html.pl (written by Patrick Riley ) on Thu Sep 04, 2025 13:21:55