Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Connecting Language and Perception

To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Ideally, an AI system would be able to learn language like a human child, by being exposed to utterances in a rich perceptual environment. The perceptual context would provide the necessary supervisory information, and learning the connection between language and perception would ground the system's semantic representations in its perception of the world. As a step in this direction, our research is developing systems that learn semantic parsers and language generators from sentences paired only with their perceptual context. It is part of our research on natural language learning. Our research on this topic is supported by the National Science Foundation through grants IIS-0712097 and IIS-1016312.
  • Grounded Language Learning [Video Lecture]
  • Raymond J. Mooney, Invited Talk, AAAI, 2013.
  • Learning Language from its Perceptual Context [Video Lecture]
  • Raymond J. Mooney, Invited Talk, ECML-PKDD, 2008.
  1. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
    [Details] [PDF] [Poster]
    Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond Mooney
    In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1218--1227, Dublin, Ireland, August 2014.
  2. Integrating Visual and Linguistic Information to Describe Properties of Objects
    [Details] [PDF]
    Calvin MacKenzie
    2014. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
  3. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition
    [Details] [PDF] [Poster]
    Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
    In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), 2712--2719, Sydney, Australia, December 2013.
  4. A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities
    [Details] [PDF]
    Stephen Roller and Sabine Schulte im Walde
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1146--1157, Seattle, WA, October 2013.
  5. Grounded Language Learning Models for Ambiguous Supervision
    [Details] [PDF] [Slides]
    Joo Hyun Kim
    PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2013.
  6. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the NAACL HLT Workshop on Vision and Language (WVL '13), 10--19, Atlanta, Georgia, July 2013.
  7. Adapting Discriminative Reranking to Grounded Language Learning
    [Details] [PDF] [Slides]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 218--227, Sofia, Bulgaria, August 2013.
  8. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), 541--547, July 2013.
  9. Improving Video Activity Recognition using Object Recognition and Text Mining
    [Details] [PDF] [Slides]
    Tanvi S. Motwani and Raymond J. Mooney
    In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), 600--605, August 2012.
  10. Generative Models of Grounded Language Learning with Ambiguous Supervision
    [Details] [PDF] [Slides]
    Joohyun Kim
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, June 2012.
  11. Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL '12), 433--444, Jeju Island, Korea, July 2012.
  12. Fast Online Lexicon Learning for Grounded Language Acquisition
    [Details] [PDF] [Slides]
    David L. Chen
    In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 430--439, July 2012.
  13. Learning Language from Ambiguous Perceptual Context
    [Details] [PDF] [Slides]
    David L. Chen
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012. 196.
  14. Learning to Interpret Natural Language Navigation Instructions from Observations
    [Details] [PDF] [Slides]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 859-865, August 2011.
  15. Panning for Gold: Finding Relevant Semantic Content for Grounded Language Learning
    [Details] [PDF] [Slides]
    David L. Chen and Raymond J. Mooney
    In Proceedings of Symposium on Machine Learning in Speech and Language Processing (MLSLP 2011), June 2011.
  16. Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 543--551, Beijing, China, August 2010.
  17. Using Closed Captions as Supervision for Video Activity Recognition
    [Details] [PDF]
    Sonal Gupta, Raymond J. Mooney
    In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), 1083--1088, Atlanta, GA, July 2010.
  18. Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language
    [Details] [PDF]
    David L. Chen, Joohyun Kim, Raymond J. Mooney
    Journal of Artificial Intelligence Research, 37:397--435, 2010.
  19. Learning Language from Perceptual Context
    [Details] [PDF] [Slides]
    David L. Chen
    December 2009. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
  20. Activity Retrieval in Closed Captioned Videos
    [Details] [PDF]
    Sonal Gupta
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, August 2009. 64 pages.
  21. Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval
    [Details] [PDF]
    Sonal Gupta and Raymond Mooney
    In Proceedings of the CVPR-09 Workshop on Visual and Contextual Learning from Annotated Images and Videos (VCL), Miami, FL, June 2009.
  22. Watch, Listen & Learn: Co-training on Captioned Images and Videos
    [Details] [PDF]
    Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 457--472, Antwerp Belgium, September 2008.
  23. Learning to Sportscast: A Test of Grounded Language Acquisition
    [Details] [PDF] [Slides] [Video]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.
  24. Learning to Connect Language and Perception
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
  25. Learning Language Semantics from Ambiguous Supervision
    [Details] [PDF]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), 895-900, Vancouver, Canada, July 2007.
  26. Learning Language from Perceptual Context: A Challenge Problem for AI
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.