Connecting Language and Perception
To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Ideally, an AI system would be able to learn language like a human child, by being exposed to utterances in a rich perceptual environment. The perceptual context would provide the necessary supervisory information, and learning the connection between language and perception would ground the system's semantic representations in its perception of the world. As a step in this direction, our research is developing systems that learn semantic parsers and language generators from sentences paired only with their perceptual context. It is part of our research on natural language learning. Our research on this topic is supported by the National Science Foundation through grants IIS-0712097 and IIS-1016312.
  • Grounded Language Learning [Video Lecture]
  • Raymond J. Mooney, Invited Talk, AAAI, 2013.
  • Learning Language from its Perceptual Context [Video Lecture]
  • Raymond J. Mooney, Invited Talk, ECML-PKDD, 2008.
Jesse Thomason Ph.D. Student jesse [at] cs utexas edu
Subhashini Venugopalan Ph.D. Student vsub [at] cs utexas edu
     [Expand to show all 26][Minimize]
Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild 2014
Jesse Thomason, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Raymond Mooney, In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), pp. 1218--1227, Dublin, Ireland, August 2014.
Integrating Visual and Linguistic Information to Describe Properties of Objects 2014
Calvin MacKenzie, Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities 2013
Stephen Roller and Sabine Schulte im Walde, In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), pp. 1146--1157, Seattle, WA, October 2013.
Adapting Discriminative Reranking to Grounded Language Learning 2013
Joohyun Kim and Raymond J. Mooney, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), pp. 218--227, Sofia, Bulgaria, August 2013.
Generating Natural-Language Video Descriptions Using Text-Mined Knowledge 2013
Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama, In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), pp. 541--547, July 2013.
Generating Natural-Language Video Descriptions Using Text-Mined Knowledge 2013
Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama, Proceedings of the NAACL HLT Workshop on Vision and Language (WVL '13) (2013), pp. 10--19.
Grounded Language Learning Models for Ambiguous Supervision 2013
Joo Hyun Kim, PhD Thesis, Department of Computer Science, University of Texas at Austin.
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition 2013
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko, In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), pp. 2712--2719, Sydney, Australia, December 2013.
Fast Online Lexicon Learning for Grounded Language Acquisition 2012
David L. Chen, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012) (2012), pp. 430--439.
Generative Models of Grounded Language Learning with Ambiguous Supervision 2012
Joohyun Kim, Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin.
Improving Video Activity Recognition using Object Recognition and Text Mining 2012
Tanvi S. Motwani and Raymond J. Mooney, In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), pp. 600--605, August 2012.
Learning Language from Ambiguous Perceptual Context 2012
David L. Chen, PhD Thesis, Department of Computer Science, University of Texas at Austin. 196.
Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision 2012
Joohyun Kim and Raymond J. Mooney, In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL '12), pp. 433--444, Jeju Island, Korea, July 2012.
Learning to Interpret Natural Language Navigation Instructions from Observations 2011
David L. Chen and Raymond J. Mooney, Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011) (2011), pp. 859-865.
Panning for Gold: Finding Relevant Semantic Content for Grounded Language Learning 2011
David L. Chen and Raymond J. Mooney, In Proceedings of Symposium on Machine Learning in Speech and Language Processing (MLSLP 2011), June 2011.
Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision 2010
Joohyun Kim and Raymond J. Mooney, In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 543--551, Beijing, China, August 2010.
Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language 2010
David L. Chen, Joohyun Kim, Raymond J. Mooney, Journal of Artificial Intelligence Research, Vol. 37 (2010), pp. 397--435.
Using Closed Captions as Supervision for Video Activity Recognition 2010
Sonal Gupta, Raymond J. Mooney, In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), pp. 1083--1088, Atlanta, GA, July 2010.
Activity Retrieval in Closed Captioned Videos 2009
Sonal Gupta, Masters Thesis, Department of Computer Sciences, University of Texas at Austin. 64 pages.
Learning Language from Perceptual Context 2009
David L. Chen, unpublished. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval 2009
Sonal Gupta and Raymond Mooney, In Proceedings of the CVPR-09 Workshop on Visual and Contextual Learning from Annotated Images and Videos (VCL), Miami, FL, June 2009.
Learning to Connect Language and Perception 2008
Raymond J. Mooney, In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), pp. 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
Learning to Sportscast: A Test of Grounded Language Acquisition 2008
David L. Chen and Raymond J. Mooney, In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.
Watch, Listen & Learn: Co-training on Captioned Images and Videos 2008
Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney, In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), pp. 457--472, Antwerp Belgium, September 2008.
Learning Language Semantics from Ambiguous Supervision 2007
Rohit J. Kate and Raymond J. Mooney, In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), pp. 895-900, Vancouver, Canada, July 2007.
Learning Language from Perceptual Context: A Challenge Problem for AI 2006
Raymond J. Mooney, In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.