Peter Stone's Selected Publications

Classified by TopicClassified by Publication TypeSorted by DateSorted by First Author Last NameClassified by Funding Source


Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy

Learning Multi-Modal Grounded Linguistic Semantics by Playing I Spy.
Jesse Thomason, Jivko Sinapov, Maxwell Svetlik, Peter Stone, and Raymond Mooney.
In Proceedings of the 25th international joint conference on Artificial Intelligence (IJCAI), July 2016.
Demo Video

Download

[PDF]3.1MB  [slides.pdf]1.0MB  

Abstract

Grounded language learning bridges words like ‘red’ and ‘square’ with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build per- ceptual models that use haptic, auditory, and pro- prioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing ob- jects using supervision from an interactive human- robot “I Spy” game. In this game, the human and robot take turns describing one object among sev- eral, then trying to guess which object the other has described. All supervision labels were gath- ered from human participants physically present to play this game with a robot. We demonstrate that our multi-modal system for grounding natu- ral language outperforms a traditional, vision-only grounding framework by comparing the two on the “I Spy” task. We also provide a qualitative analysis of the groundings learned in the game, visualizing what words are understood better with multi-modal sensory information as well as identifying learned word meanings that correlate with physical object properties (e.g. ‘small’ negatively correlates with object weight)

BibTeX Entry

@InProceedings{IJCAI16-thomason,
  title={Learning Multi-Modal Grounded Linguistic Semantics by Playing {I Spy}},
  author={Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond Mooney},
  booktitle={Proceedings of the 25th international joint conference on Artificial Intelligence (IJCAI)},
  location = {New York City, USA},
  month = {July},
  year = {2016},
  abstract = {Grounded language learning bridges words like ‘red’ and ‘square’ with robot perception. The vast majority of existing work in this space limits robot perception to vision. In this paper, we build per- ceptual models that use haptic, auditory, and pro- prioceptive data acquired through robot exploratory behaviors to go beyond vision. Our system learns to ground natural language words describing ob- jects using supervision from an interactive human- robot “I Spy” game. In this game, the human and robot take turns describing one object among sev- eral, then trying to guess which object the other has described. All supervision labels were gath- ered from human participants physically present to play this game with a robot. We demonstrate that our multi-modal system for grounding natu- ral language outperforms a traditional, vision-only grounding framework by comparing the two on the “I Spy” task. We also provide a qualitative analysis of the groundings learned in the game, visualizing what words are understood better with multi-modal sensory information as well as identifying learned word meanings that correlate with physical object properties (e.g. ‘small’ negatively correlates with object weight)},
  wwwnote={<a href="https://youtu.be/jLHzRXPCi_w"> Demo Video</a>},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Jun 10, 2026 15:26:46