Robot Developmental Learning of an Object Ontology Grounded in Sensorimotor Experience (2007)
How can a robot learn to conceptualize its environment in terms of objects and actions, starting from its intrinsic "pixel-level" sensorimotor interface? Several domains in artificial intelligence (including language, planning, and logic) rely on the existence of a symbolic representation that provides objects, relations, and actions. With real robots it is difficult to ground these high-level symbolic representations, because hand-written object models and control routines are often brittle and fail to account for the complexities of the real world. In contrast, developmental psychologists describe how an infant's naive understanding of objects transforms with experience into an adult's more sophisticated understanding. Can a robot's understanding of objects develop similarly? This thesis describes a learning process that leads to a simple and useful theory of objects, their properties, and the actions that apply to them. The robot's initial "pixel-level" experience consists of a range-sensor image stream and a background model of its immediate surroundings. The background model is an occupancy grid that explains away most of the range-sensor data using a static world assumption. To this developing robot, an "object" is a theoretical construct abduced to explain a subset of the robot's sensorimotor experience that is not explained by the background model. This approach leads to the Object Perception and Action Learner (OPAL). OPAL starts with a simple theory of objects that is used to bootstrap more sophisticated capabilities. In the initial theory, the sensor returns explained by an object have spatial and temporal proximity. This allows the robot to individuate, track, describe, and classify objects (such as a chair or wastebasket) in a simple scene without complex prior knowledge. The initial theory is used to learn a more sophisticated theory. First, the robot uses the perceptual representations described above to create structurally consistent object models that support object localization and recognition. Second, the robot learns actions that support planning to achieve object-based goals. The combined system extends the robot's representational capabilities to include objects and both constant and time-varying properties of objects. The robot can use constant properties such as shape to recognize objects it has previously observed. It can also use time-varying properties such as location or orientation to track objects that move. These properties can be used to represent the learned preconditions and postconditions of actions. Thus, the robot can make and execute plans to achieve object-based goals, using the pre- and post-conditions to infer the ordering constraints among actions in the plan. The learning process and the learned representations were evaluated with metrics that support verification by both the robot and the experimenter. The robot learned object shape models that are structurally consistent to within the robot's sensor precision. The learned shape models also support accurate object classification with externally provided labels. The robot achieved goals specified in terms of object properties by planning with the learned actions, solving tasks such as facing an object, approaching an object, and moving an object to a target location. The robot completed these tasks both reliably and accurately.
PhD Thesis, Computer Sciences Department, University of Texas at Austin.

Joseph Modayil Ph.D. Alumni modayil [at] cs utexas edu