Augmenting Robotic Capabilities through Natural Language (2025)
Despite rapid advances in language and vision models, current robots still lag far behind human physical capabilities due to the relative scarcity of real-world data compared to online text and images. How can we leverage abundant language data to advance robotic capabilities? Language provides semantic structure that facilitates the understanding of diverse data, improving sample efficiency in scarce data regimes. It also provides a natural communicative medium when interacting with and learning from humans. To leverage the first benefit of language, we first take inspiration from how humans teach each other in video tutorials, through simultaneous video and language streams, to more efficiently teach robots new skills. We then show that language can bridge wide visual sim2real gaps, enabling robots to learn tasks with just a few real-world demonstrations by leveraging knowledge from imperfect simulation data. To leverage the second benefit of language, we explore how bidirectional dialog can enable robots to solve complex manipulation tasks by communicating to and collaborating with a wide distribution of human collaborators in the real-world. We develop a robotic framework that requests and proactively offers help through mixed-initiative, free-form dialog, enabling the robot to adapt to changing human preferences and each agent’s physical capabilities to be strategically utilized. Finally, we discuss avenues of future work, such as how human-robot collaboration can be facilitated through dialog-based replanning, how both agents can improve through bidirectional feedback, and how language-based guidelines extracted from manuals can enable robots to behave more safely and learn more quickly.
View:
PDF
Citation:
Ph.D. Proposal.
Bibtex:

Albert Yu Ph.D. Student albertyu [at] utexas edu