Leveraging Commonsense Reasoning and Multimodal Perception for Robot Spoken Dialog Systems (2017)
Dongcai Lu, Shiqi Zhang, Peter Stone, and Xiaoping Chen
Probabilistic graphical models, such as partially observable Markov decision processes (POMDPs), have been used in stochastic spoken dialog systems to handle the inherent uncertainty in speech recognition and language understanding. Such dialog systems suffer from the fact that only a relatively small number of domain variables are allowed in the model, so as to ensure the generation of good-quality dialog policies. At the same time, the non-language perception modalities on robots, such as vision-based facial expression recognition and Lidar-based distance detection, can hardly be integrated into this process. In this paper, we use a probabilistic commonsense reasoner to “guide” our POMDP-based dialog manager, and present a principled, multimodal dialog management (MDM) framework that allows the robot’s dialog belief state to be seamlessly updated by both observations of human spoken language, and exogenous events such as the change of human facial expressions. The MDM approach has been implemented and evaluated both in simulation and on a real mobile robot using guidance tasks.
In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, September 2017.

Peter Stone Faculty pstone [at] cs utexas edu
Shiqi Zhang Postdoctoral Alumni szhang [at] cs utexas edu