Machine Learning Research Group | University of Texas

Publications: Connecting Language and Perception

To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Ideally, an AI system would be able to learn language like a human child, by being exposed to utterances in a rich perceptual environment. The perceptual context would provide the necessary supervisory information, and learning the connection between language and perception would ground the system's semantic representations in its perception of the world. As a step in this direction, our research is developing systems that learn semantic parsers and language generators from sentences paired only with their perceptual context. It is part of our research on natural language learning. Our research on this topic is supported by the National Science Foundation through grants IIS-0712097 and IIS-1016312.

Grounded Language Learning [Video Lecture]

AAAI

Learning Language from its Perceptual Context [Video Lecture]

ECML-PKDD

Sub-areas:

Show abstracts

Temporally Streaming Audio-Visual Synchronization for Real-World Videos
[Details] [PDF]
Jordan Voas, Wei-Cheng Tseng, Layne Berry, Xixi Hu, Puyuan Peng, James Stuedemann, and David Harwath
In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), February 2025.
Measuring Sound Symbolism in Audio-visual Models
[Details] [PDF] [Poster]
Wei-Cheng Tseng, Yi-Jen Shih, David Harwath, Raymond Mooney
In IEEE Spoken Language Technology (SLT) Workshop, December 2024.
Multimodal Contextualized Semantic Parsing from Speech
[Details] [PDF] [Slides (PDF)] [Poster] [Video]
Jordan Voas, Raymond Mooney, David Harwath
In Association for Computational Linguistics (ACL), August 2024.
What is the Best Automated Metric for Text to Motion Generation?
[Details] [PDF]
Jordan Voas
Masters Thesis, Department of Computer Science, UT Austin, Austin, TX, May 2023.
Directly Optimizing Evaluation Metrics to Improve Text to Motion
[Details] [PDF]
Yili Wang
Masters Thesis, Department of Computer Science, UT Austin, May 2023.
Systematic Generalization on gSCAN with Language Conditioned Embedding
[Details] [PDF] [Video]
Tong Gao, Qi Huang and Raymond J. Mooney
In The 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing , December 2020.
Dialog as a Vehicle for Lifelong Learning
[Details] [PDF] [Slides (PDF)] [Video]
Aishwarya Padmakumar, Raymond J. Mooney
In Position Paper Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDial 2.0), July 2020.
Learning a Policy for Opportunistic Active Learning
[Details] [PDF]
Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP-18), Brussels, Belgium, November 2018.
Learning to Connect Language and Perception
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
Learning Language Semantics from Ambiguous Supervision
[Details] [PDF]
Rohit J. Kate and Raymond J. Mooney
In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), 895-900, Vancouver, Canada, July 2007.
Learning Language from Perceptual Context: A Challenge Problem for AI
[Details] [PDF]
Raymond J. Mooney
In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.