Department of Computer Science

Machine Learning Research Group

University of Texas at Austin Artificial Intelligence Lab

Publications: Connecting Language and Perception

To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Ideally, an AI system would be able to learn language like a human child, by being exposed to utterances in a rich perceptual environment. The perceptual context would provide the necessary supervisory information, and learning the connection between language and perception would ground the system's semantic representations in its perception of the world. As a step in this direction, our research is developing systems that learn semantic parsers and language generators from sentences paired only with their perceptual context. It is part of our research on natural language learning. Our research on this topic is supported by the National Science Foundation through grants IIS-0712097 and IIS-1016312.
  • Grounded Language Learning [Video Lecture]
  • Raymond J. Mooney, Invited Talk, AAAI, 2013.
  • Learning Language from its Perceptual Context [Video Lecture]
  • Raymond J. Mooney, Invited Talk, ECML-PKDD, 2008.

Sub-areas:
  1. Systematic Generalization on gSCAN with Language Conditioned Embedding
    [Details] [PDF] [Video]
    Tong Gao, Qi Huang and Raymond J. Mooney
    In The 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing , December 2020.
    Systematic Generalization refers to a learning algorithm’s ability to extrapolate learned behavior to unseen situations that are distinct but semantically similar to its training data. As shown in recent work, state-of-the-art deep learning models fail dramatically even on tasks for which they are designed when the test set is systematically different from the training data. We hypothesize that explicitly modeling the relations between objects in their contexts while learning their representations will help achieve systematic generalization. There- fore, we propose a novel method that learns objects’ contextualized embedding with dynamic message-passing conditioned on the input natural language and is end-to-end trainable with other downstream deep learning modules. To our knowledge, this model is the first one that significantly outperforms the provided baseline and reaches state-of-the-art performance on grounded SCAN (gSCAN), a grounded natural language navigation dataset designed to require systematic generalization in its test splits.
    ML ID: 390
  2. Dialog as a Vehicle for Lifelong Learning
    [Details] [PDF] [Slides (PDF)] [Video]
    Aishwarya Padmakumar, Raymond J. Mooney
    In Position Paper Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDial 2.0), July 2020.
    Dialog systems research has primarily been focused around two main types of applications – task-oriented dialog systems that learn to use clarification to aid in understanding a goal, and open-ended dialog systems that are expected to carry out unconstrained “chit chat” conversations. However, dialog interactions can also be used to obtain various types of knowledge that can be used to improve an underlying language understanding system, or other machine learning systems that the dialog acts over. In this position paper, we present the problem of designing dialog systems that enable lifelong learning as an important challenge problem, in particular for applications involving physically situated robots. We include examples of prior work in this direction, and discuss challenges that remain to be addressed.
    ML ID: 386
  3. Generating Animated Videos of Human Activities from Natural Language Descriptions
    [Details] [PDF] [Poster]
    Angela S. Lin, Lemeng Wu, Rodolfo Corona , Kevin Tai , Qixing Huang , Raymond J. Mooney
    In Proceedings of the Visually Grounded Interaction and Language Workshop at NeurIPS 2018, December 2018.
    Generating realistic character animations is of great importance in computer graphics and related domains. Existing approaches for this application involve a significant amount of human interaction. In this paper, we introduce a system that maps a natural language description to an animation of a humanoid skeleton. Our system is a sequence-to-sequence model that is pretrained with an autoencoder objective and then trained end-to-end.
    ML ID: 369
  4. Learning a Policy for Opportunistic Active Learning
    [Details] [PDF]
    Aishwarya Padmakumar, Peter Stone, Raymond J. Mooney
    In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP-18), Brussels, Belgium, November 2018.
    Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.
    ML ID: 368
  5. Learning to Connect Language and Perception
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI), 1598--1601, Chicago, IL, July 2008. Senior Member Paper.
    To truly understand language, an intelligent system must be able to connect words, phrases, and sentences to its perception of objects and events in the world. Current natural language processing and computer vision systems make extensive use of machine learning to acquire the probabilistic knowledge needed to comprehend linguistic and visual input. However, to date, there has been relatively little work on learning the relationships between the two modalities. In this talk, I will review some of the existing work on learning to connect language and perception, discuss important directions for future research in this area, and argue that the time is now ripe to make a concerted effort to address this important, integrative AI problem.
    ML ID: 216
  6. Learning Language Semantics from Ambiguous Supervision
    [Details] [PDF]
    Rohit J. Kate and Raymond J. Mooney
    In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), 895-900, Vancouver, Canada, July 2007.
    This paper presents a method for learning a semantic parser from ambiguous supervision. Training data consists of natural language sentences annotated with multiple potential meaning representations, only one of which is correct. Such ambiguous supervision models the type of supervision that can be more naturally available to language-learning systems. Given such weak supervision, our approach produces a semantic parser that maps sentences into meaning representations. An existing semantic parsing learning system that can only learn from unambiguous supervision is augmented to handle ambiguous supervision. Experimental results show that the resulting system is able to cope up with ambiguities and learn accurate semantic parsers.
    ML ID: 200
  7. Learning Language from Perceptual Context: A Challenge Problem for AI
    [Details] [PDF]
    Raymond J. Mooney
    In Proceedings of the 2006 AAAI Fellows Symposium, Boston, MA, July 2006.
    We present the problem of learning to understand natural language from examples of utterances paired only with their relevant real-world context as an important challenge problem for AI. Machine learning has been adopted as the most effective way of developing natural-language processing systems; however, currently, complex annotated corpora are required for training. By learning language from perceptual context, the need for laborious annotation is removed and the system's resulting understanding is grounded in its perceptual experience.
    ML ID: 192