Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Connecting Language and Perception

To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Ideally, an AI system would be able to learn language like a human child, by being exposed to utterances in a rich perceptual environment. The perceptual context would provide the necessary supervisory information, and learning the connection between language and perception would ground the system's semantic representations in its perception of the world. As a step in this direction, our research is developing systems that learn semantic parsers and language generators from sentences paired only with their perceptual context. It is part of our research on natural language learning. Our research on this topic is supported by the National Science Foundation through grants IIS-0712097 and IIS-1016312.
  • Grounded Language Learning [Video Lecture]
  • Raymond J. Mooney, Invited Talk, AAAI, 2013.
  • Learning Language from its Perceptual Context [Video Lecture]
  • Raymond J. Mooney, Invited Talk, ECML-PKDD, 2008.
  1. Opportunistic Active Learning for Grounding Natural Language Descriptions
    [Details] [PDF] [Slides (PPT)] [Slides (PDF)]
    Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Justin Hart and Peter Stone and Raymond J. Mooney
    In Sergey Levine and Vincent Vanhoucke and Ken Goldberg, editors, Proceedings of the 1st Annual Conference on Robot Learning (CoRL-17), 67--76, Mountain View, California, November 2017. PMLR.
  2. Natural-Language Video Description with Deep Recurrent Neural Networks
    [Details] [PDF] [Slides (PDF)]
    Subhashini Venugopalan
    PhD Thesis, Department of Computer Science, The University of Texas at Austin, August 2017.
  3. Using Explanations to Improve Ensembling of Visual Question Answering Systems
    [Details] [PDF] [Poster]
    Nazneen Fatema Rajani and Raymond J. Mooney
    In Proceedings of the IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI), 43-47, Melbourne, Australia, August 2017.
  4. Guiding Interaction Behaviors for Multi-modal Grounded Language Learning
    [Details] [PDF] [Poster]
    Jesse Thomason and Jivko Sinapov and Raymond J. Mooney
    In Proceedings of the Workshop on Language Grounding for Robotics at ACL 2017 (RoboNLP-17), Vancouver, Canada, August 2017.
  5. Multi-Modal Word Synset Induction
    [Details] [PDF] [Slides (PDF)] [Poster]
    Jesse Thomason and Raymond J. Mooney
    In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI-17), 4116--4122, Melbourne, Australia, 2017.
  6. Captioning Images with Diverse Objects
    [Details] [PDF] [Slides (PDF)] [Poster]
    Subhashini Venugopalan and Lisa Anne Hendricks and Marcus Rohrbach and Raymond Mooney and Trevor Darrell and Kate Saenko
    In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR-17), 5753--5761, 2017.
  7. Stacking With Auxiliary Features: Improved Ensembling for Natural Language and Vision
    [Details] [PDF] [Slides (PDF)]
    Nazneen Fatema Rajani
    November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
  8. Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception
    [Details] [PDF] [Slides (PDF)]
    Jesse Thomason
    November 2016. PhD proposal, Department of Computer Science, The University of Texas at Austin.
  9. Learning Multi-Modal Grounded Linguistic Semantics by Playing "I Spy"
    [Details] [PDF] [Slides (PDF)] [Poster]
    Jesse Thomason and Jivko Sinapov and Maxwell Svetlik and Peter Stone and Raymond J. Mooney
    In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI-16), 3477--3483, New York City, 2016.
  10. Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text
    [Details] [PDF] [Poster]
    Subhashini Venugopalan and Lisa Anne Hendricks and Raymond Mooney and Kate Saenko
    In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP-16), 1961--1966, Austin, Texas, 2016.
  11. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
    [Details] [PDF]
    Lisa Anne Hendricks and Subhashini Venugopalan and Marcus Rohrbach and Raymond Mooney and Kate Saenko and Trevor Darrell
    In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR-16), 1--10, 2016.
  12. Natural Language Video Description using Deep Recurrent Neural Networks
    [Details] [PDF] [Slides (PDF)]
    Subhashini Venugopalan
    November 2015. PhD proposal, Department of Computer Science, The University of Texas at Austin.
  13. Sequence to Sequence -- Video to Text
    [Details] [PDF]
    Subhashini Venugopalan and Marcus Rohrbach and Jeff Donahue and Raymond J. Mooney and Trevor Darrell and Kate Saenko
    In Proceedings of the 2015 International Conference on Computer Vision (ICCV-15), Santiago, Chile, December 2015.
  14. Learning to Interpret Natural Language Commands through Human-Robot Dialog
    [Details] [PDF] [Slides (PDF)]
    Jesse Thomason and Shiqi Zhang and Raymond Mooney and Peter Stone
    In Proceedings of the 2015 International Joint Conference on Artificial Intelligence (IJCAI), 1923--1929, Buenos Aires, Argentina, July 2015.
  15. Translating Videos to Natural Language Using Deep Recurrent Neural Networks
    [Details] [PDF] [Slides (PDF)]
    Subhashini Venugopalan and Huijuan Xu and Jeff Donahue and Marcus Rohrbach and Raymond Mooney and Kate Saenko
    In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT 2015), 1494--1504, Denver, Colorado, June 2015.
  16. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild
    [Details] [PDF] [Poster]
    Jesse Thomason and Subhashini Venugopalan and Sergio Guadarrama and Kate Saenko and Raymond Mooney
    In Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), 1218--1227, Dublin, Ireland, August 2014.
  17. Integrating Visual and Linguistic Information to Describe Properties of Objects
    [Details] [PDF]
    Calvin MacKenzie
    2014. Undergraduate Honors Thesis, Computer Science Department, University of Texas at Austin.
  18. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition
    [Details] [PDF] [Poster]
    Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko
    In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), 2712--2719, Sydney, Australia, December 2013.
  19. A Multimodal LDA Model Integrating Textual, Cognitive and Visual Modalities
    [Details] [PDF]
    Stephen Roller and Sabine Schulte im Walde
    In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013), 1146--1157, Seattle, WA, October 2013.
  20. Grounded Language Learning Models for Ambiguous Supervision
    [Details] [PDF] [Slides (PPT)]
    Joo Hyun Kim
    PhD Thesis, Department of Computer Science, University of Texas at Austin, December 2013.
  21. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides (PPT)]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the NAACL HLT Workshop on Vision and Language (WVL '13), 10--19, Atlanta, Georgia, July 2013.
  22. Adapting Discriminative Reranking to Grounded Language Learning
    [Details] [PDF] [Slides (PPT)]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), 218--227, Sofia, Bulgaria, August 2013.
  23. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
    [Details] [PDF] [Slides (PPT)]
    Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama
    In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), 541--547, July 2013.
  24. Improving Video Activity Recognition using Object Recognition and Text Mining
    [Details] [PDF] [Slides (PPT)]
    Tanvi S. Motwani and Raymond J. Mooney
    In Proceedings of the 20th European Conference on Artificial Intelligence (ECAI-2012), 600--605, August 2012.
  25. Generative Models of Grounded Language Learning with Ambiguous Supervision
    [Details] [PDF] [Slides (PPT)]
    Joohyun Kim
    Technical Report, PhD proposal, Department of Computer Science, The University of Texas at Austin, June 2012.
  26. Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP-CoNLL '12), 433--444, Jeju Island, Korea, July 2012.
  27. Fast Online Lexicon Learning for Grounded Language Acquisition
    [Details] [PDF] [Slides (PPT)]
    David L. Chen
    In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL-2012), 430--439, July 2012.
  28. Learning Language from Ambiguous Perceptual Context
    [Details] [PDF] [Slides (PPT)]
    David L. Chen
    PhD Thesis, Department of Computer Science, University of Texas at Austin, May 2012. 196.
  29. Learning to Interpret Natural Language Navigation Instructions from Observations
    [Details] [PDF] [Slides (PPT)]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th AAAI Conference on Artificial Intelligence (AAAI-2011), 859-865, August 2011.
  30. Panning for Gold: Finding Relevant Semantic Content for Grounded Language Learning
    [Details] [PDF] [Slides (PDF)]
    David L. Chen and Raymond J. Mooney
    In Proceedings of Symposium on Machine Learning in Speech and Language Processing (MLSLP 2011), June 2011.
  31. Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision
    [Details] [PDF]
    Joohyun Kim and Raymond J. Mooney
    In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), 543--551, Beijing, China, August 2010.
  32. Using Closed Captions as Supervision for Video Activity Recognition
    [Details] [PDF]
    Sonal Gupta, Raymond J. Mooney
    In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-2010), 1083--1088, Atlanta, GA, July 2010.
  33. Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language
    [Details] [PDF]
    David L. Chen, Joohyun Kim, Raymond J. Mooney
    Journal of Artificial Intelligence Research, 37:397--435, 2010.
  34. Learning Language from Perceptual Context
    [Details] [PDF] [Slides (PPT)]
    David L. Chen
    December 2009. Ph.D. proposal, Department of Computer Sciences, University of Texas at Austin.
  35. Activity Retrieval in Closed Captioned Videos
    [Details] [PDF]
    Sonal Gupta
    Masters Thesis, Department of Computer Sciences, University of Texas at Austin, August 2009. 64 pages.
  36. Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval
    [Details] [PDF]
    Sonal Gupta and Raymond Mooney
    In Proceedings of the CVPR-09 Workshop on Visual and Contextual Learning from Annotated Images and Videos (VCL), Miami, FL, June 2009.
  37. Watch, Listen & Learn: Co-training on Captioned Images and Videos
    [Details] [PDF]
    Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney
    In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 457--472, Antwerp Belgium, September 2008.
  38. Learning to Sportscast: A Test of Grounded Language Acquisition
    [Details] [PDF] [Slides (PPT)] [Video]
    David L. Chen and Raymond J. Mooney
    In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, July 2008.
  39. Learning to Connect Language and Perception
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
  40. Learning Language Semantics from Ambiguous Supervision
    [Details] [PDF]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), 895-900, Vancouver, Canada, July 2007.
  41. Learning Language from Perceptual Context: A Challenge Problem for AI
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.