Continuously Improving Natural Language Understanding for Robotic Systems through Semantic Parsing, Dialog, and Multi-modal Perception (2016)
Robotic systems that interact with untrained human users must be able to understand and respond to natural language commands and questions. If a person requests ``take me to Alice's office'', the system and person must know that Alice is a person who owns some unique office. Similarly, if a person requests ``bring me the heavy, green mug'', the system and person must both know ``heavy'', ``green'', and ``mug'' are properties that describe an object in the environment, and have similar ideas about to what objects those properties apply. To facilitate deployment, methods to achieve these goals should require little initial in-domain data.

We present completed work on understanding human language commands using sparse initial resources for semantic parsing. Clarification dialog with humans simultaneously resolves misunderstandings and generates more training data for better downstream parser performance. We introduce multi-modal grounding classifiers to give the robotic system perceptual contexts to understand object properties like ``green'' and ``heavy''. Additionally, we introduce and explore the task of word sense synonym set induction, which aims to discover polysemy and synonymy, which is helpful in the presence of sparse data and ambiguous properties such as ``light'' (light-colored versus lightweight).

We propose to combine these orthogonal components into an integrated robotic system that understands human commands involving both static domain knowledge (such as who owns what office) and perceptual grounding (such as object retrieval). Additionally, we propose to strengthen the perceptual grounding component by performing word sense synonym set induction on object property words. We offer several long-term proposals to improve such an integrated system: exploring novel objects using only the context-necessary set of behaviors, a more natural learning paradigm for perception, and leveraging linguistic accommodation to improve parsing.

PhD proposal, Department of Computer Science, The University of Texas at Austin.

Slides (PDF)
Jesse Thomason Ph.D. Student jesse [at] cs utexas edu