CS 395T: Intelligent Robotics
- Spring 2006. TTh, 3:30 - 5:00 pm
- TAY 3.144
- Unique #54215.
- Professor Benjamin Kuipers
- Office hours: 1-2 pm, TTh.
For robots to be intelligent in the way people are intelligent,
they will have to learn about their world, and their own ability to
interact with it, much like people do. This research seminar will
investigate new research directions in robot learning.
Traditionally, robots have been useful in manufacturing by moving
blindly but precisely in totally controlled workcells. Traditionally,
symbolic AI systems have given the appearance of intelligence by
applying logical inference algorithms to symbol structures whose
primitive elements are specified by human programmers. This has left
AI systems open to Searle's famous "Chinese Room" critique, arguing
that they only mimic intelligence: they are merely "faking it".
To answer this philosophical challenge, and to be useful in a host of
real-world application on Earth and in space, AI systems need to be
robots, with sensors and effectors embedded in the physical world.
Not only that, but these robots must learn the nature of their own
sensorimotor interaction with the environment, and must create their
own symbols, grounded in their own experience.
Robots are being created with ever more complex and richly structured
sensors. The sensorimotor system evolves over time, sometimes
deteriorating, but sometimes being augmented with new "plug-and-play"
sensors. Humans are astonishingly adaptable to sensorimotor changes,
and children do an amazing job of learning to use their sensors and
effectors in a few short years after birth. We can learn important
things about robots from research on children. And robot models may
help us create better theories of child development.
Our main focus is on robot learning of the "foundational domains" that
underlie commonsense knowledge: space, time, actions, objects,
properties and affordances, causality, and so on. The question is how
these higher-level ontologies can be learned from experience with
low-level sensor and motor interactions with the world (the "pixel
Perhaps surprisingly, this has led us to certain questions about the
nature of consciousness, what its properties are, what those
properties tell us about the architecture of the mind, and how it is
involved in learning these foundational domains. We will read books and
papers from a wide range of disciplines, attempting to see the underlying
reality that they are describing from different points of view.
This is a research seminar, intended first to bring you to the state
of the art, and then to help you do a project and paper of publishable
quality. There will be a significant amount of reading and discussion
of recent research papers that will be handed out.
The requirements of the course will be:
- (35%) Class presentation on one or more papers.
- (15%) Class participation in discussions.
- (50%) Term project, presentation, and paper.
Each class member will select a topic and present the material to the
class. Each topic will have an associated reading that the entire
class will read, but the presenter is responsible for finding and
reading additional material, becoming an expert in the area, creating
an illuminating example to present, and leading a discussion.
Pick a presentation topic that works well with your term project
topic. The papers will be accessible online through the UT Library,
or via link here. In some cases, you will need to review several
related papers by the authors.
Be prepared to give a 45 minute presentation, followed by specific
questions and more general discussion of the value and importance of
the material presented.
Here is a thematic outline. You don't need to cover the points in
exactly this order, but try to address these needs for your audience.
Prepare PowerPoint slides for your presentation. Send me a copy of
your slides two or three days before your presentation, and I will
give you feedback as quickly as I can. Make copies of your slides to
hand out to the class before your presentation.
- What is the problem? Why is it important? Why should the reader care?
- What assumptions are being made?
- How does this method work?
Provide an intuition to guide
the hearer through the technical details.
Then provide a
more detailed example to show what the intuitions mean.
- What are the strengths and limitations of this approach?
- How can you evaluate the benefits?
- What are open problems in this area?
- How does this help us?
Where is the gold?
Each student will pick one of the available sub-bullets, and will be
responsible for presenting and discussing that topic. The related
paper (or papers) will be provided soon.
- Introduction (2 weeks: Jan 17 - 26) [Kuipers]
- The Hybrid Spatial Semantic Hierarchy [Beeson and Modayil]
- Progress in Foundational Learning [Provost and Modayil]
- Course intro: How can a robot have a mind of its own?
- Learning about an uninterpreted sensorimotor system
[Pierce and Kuipers, AIJ, 1997], which is based on our
Spatial Semantic Hierarchy work on cognitive mapping in
[Kuipers, AIJ, 2000].
- Statistical Learning Methods (2 weeks: Jan 31 - Feb 9)
- Clustering: AutoClass
- [Cheeseman & Stutz, 1996] [.ps]
- Factoring: Principal Component Analysis (PCA) and
Independent Component Analysis (ICA)
- Hyvarinen, Survey on independent component analysis.
Neural Computing Surveys 2: 94-128, 1999.
- Steyvers, Multidimensional Scaling.
Encyclopedia of Cognitive Science, 2002.
- Clustering: Self-Organizing Maps and Growing Neural Gas
- Factoring: Nonlinear dimensionality reduction
- Tenenbaum, de Silva and Langford,
A global geometric framework for nonlinear dimensionality reduction.
Science 290: 2319-2323, 2000.
- Roweis and Saul,
Nonlinear dimensionality reduction by locally linear embedding.
Science 290: 2323-2326, 2000.
- Learning about Actions (2 weeks: Feb 14 - 23)
- Marginal attribution: Drescher's schema mechanism [Kuipers]
- Gary Drescher, Made-Up Minds, MIT Press, 1991.
- Bayesian structure learning in children
- Learning behavior models from observations
- Fox, Ghallab, Infantes, and Long, Robot introspection through
learned hidden Markov models.
Artificial Intelligence 170: 59-113, 2006.
- Curiosity-driven exploration (Oudeyer and Kaplan)
- Oudeyer and Kaplan, Intelligent adaptive curiosity.
Epigenetic Robotics Conference, 2004.
- Computer and Human Vision (2 weeks: Feb 28 - Mar 9)
- Vision as an embodied process
- Ballard, Hayhoe, Pook, and Rao,
Deictic codes for the embodiment of cognition.
Behavioral and Brain Sciences 20: 723-767, 1997.
- Vision as mastery of sensorimotor contingencies
- O'Regan and Noe, A sensorimotor account
of vision and visual consciousness.
Behavioral and Brain Sciences 24: 939-1031, 2001.
- Recognizing and learning object categories
- Tracking visual objects
- Forsythe and Ponce, Computer Vision: A Modern Approach, 2003, ch.17
[Amazon] (but note reviews)
- Peter Corke, The Machine Vision Toolbox, IEEE Robotics and
Automation Magazine, December 2005.
- Connecting Symbols to the World (3 weeks: March 21 - April 6)
- Semiotic schemas
- Deb Roy, Semiotic schemas: a framework for grounding language
in action and perception.
Artificial Intelligence 167: 170-205, 2005.
- Learning to talk about events
- Dominey and Boucher, Learning to talk about events from narrated video
in a construction grammar framework.
Artificial Intelligence 167: 31--61, 2005.
- Evolving language and grammar
- Paul Vogt, The emergence of compositional structure in perceptually
grounded language games.
Artificial Intelligence 167: 206-242, 2005.
- Embodied multimodal language learning
- Yu and Ballard, On the integration of grounding language
and learning objects. AAAI, 2004.
- Yu and Ballard, A multimodal learning interface for grounding
spoken language in sensorimotor experience.
ACM Trans. Applied Perception 1, 2004.
- Also see Yu, Ballard and Aslin, The role of embodied intention
in early lexical acquisition.
Cognitive Science 29(6): 961-1005, 2005.
- Image schemas (Mark Johnson; Beate Hampe)
- Mark Johnson, The Body in the Mind, 1987.
- Cognitive Architecture (2 weeks: April 11 - 20) [Kuipers]
- Consciousness: responding to John Searle
- Searle, Mind: A Brief Introduction, 2004, chap.5.
- Kuipers, Consciousness: Drinking from the firehose of experience,
- Global Workspace Theory according to Bernard Baars
- Baars, A Cognitive Theory of Consciousness, 1988.
- Baars, In the Theater of Consciousness, 1997.
- Perceptual Meaning Analysis according to Jean Mandler
- Mandler, The Foundations of Mind, 2004.
- Project Reports (2 weeks: April 25 - May 4)
Each class member will do a term project. You can apply a method we
are learning about to a robot learning problem. Or you can extend an
existing method or develop a new method to solve a problem. Ideally,
your term project will extend the state of the art, and will be
suitable for submission to AAAI, ICRA, IROS or some other major
You are encouraged to select a topic that fits well with your other
Possible Project Topics
The following is an incomplete list of project topic suggestions. More can
be added, and you can propose ideas of your own. In several cases,
you will see several projects that are closely related, and might
build on each other. People working on those projects should
consider coordinating their efforts.
- Distinguishing and characterizing sensor modalities. Can
we use modern Bayesian methods to do a better job than [P+K,97] of
separating an uninterpreted sense vector into distinct sensor
modalities? As an example, here is a directory
of sensor traces collected from
Lassie, a Magellan Pro robot with
laser range-finder, sonar, infrared, bump sensors, and odometry.
Once these are separated into modalities, can you also distinguish
between useful values and error values of each sensor? Can you find
additional sensor traces on the web for testing your method? Also,
look at some interesting related work by
Philipona on uninterpreted sensorimotor systems.
- Identifying cross-modal relationships. Given a
separation of the sense vector into distinct modalities, replicate
the multidimensional scaling of each modality from [P+K,97]. Can you
determine how the different sensory modalities relate to each other,
providing different kinds of information about the same underlying
reality. For example, laser and sonar are both range sensors, but
they have different units, different resolution, different coverage,
different error models, and so on. IR and bump sensors are also
range sensors, but with even more differences. Odometry is related
to range, but less directly. Can you detect and characterize that
relationship? Can you use the cross-modal relationships to find new
concepts? Look at the work of Michael Coen
on cross-modal clustering (AAAI-05) and multi-modal integration
(IJCAI-01). (His recent work is on clustering of speech sounds and
bird songs. Can we apply it to robot sensors?)
- Learning allocentric space from egocentric experience.
In an egocentric frame of reference, when the robot acts, everything
in its world moves around it. In a world-centered ("allocentric")
frame of reference, the static parts of the world have a fixed
location and only the robot itself moves (for now, assume the robot
is the only dynamic part of the environment). A range-sensor trace
captures the robot's egocentric experience with its environment. Is
there an analysis method (clever use of ICA?) that will identify a
useful world-centered ("allocentric") frame of reference for that
environment? This amounts to creating a map of the static world, and
updating the robot's pose as it moves within that frame of reference.
- Learning sensors with overlapping receptive fields. Vulcan, our intelligent wheelchair, has a
pair of laser rangefinders mounted at its front corners, facing 45
degrees outward from the front axis, so their fields of view overlap
by about 90 degrees. Assume (or demonstrate) that you can determine
that these are two different sensors of the same type. Can the robot
autonomously learn that the fields of view overlap? How should it
represent the sensor input in the overlap region? How should it
represent the sensor input in the non-overlapping regions? Can it
build a better fused image of its nearby surround, using the
information from the two laser rangefinders?
- Learning the structure of the retina. Vision provides
vastly more information than laser range-finders, and vision is
foveated, with much higher resolution in the fovea than in the
periphery. Can the structure of the retina be learned from
experience the way we have described for range-finders? (This
learning could take place during evolution or embryogenesis rather
than infancy.) For information on child development of vision, see
Hainline, The development of basic visual abilities, in Slater (Ed.),
Perceptual Development, 1998 [PDF]. (Also, be sure to read
the O'Regan and Noe paper in connection with this project.)
A simple way to define an experimental foveated retina is to create a
version of the "roving eye" consisting of three concentric square
retinas. The center one has full resolution of the image it moves
over. The next one out has half the resolution, so its "pixels"
represent 2x2 regions of the image. The outer one is half again, so
its "pixels" are 4x4 regions in the image.
- Learning binocular tracking with foveated retinas.
Suppose an agent has two eyes with foveated retinas (forward-facing
like humans). The correlation between the images in the two eyes
will be maximized when the two eyes have the same object in the
foveal region (because that's where most of the pixels are). Can an
agent that starts without the ability of visual tracking, learn that
capability by learning to move the two eyes to keep their correlation
maximized? This would allow tracking to be learned before object
recognition, and would provide more information for learning object
recognition later. If you can come up with a learning scheme for
this, how does it relate to the evidence about how human infants
learn visual tracking? (Note that there are many online databases of
test images. Google: stereo pair image database)
- Learning object-based stereo range-finding from foveated
tracking. Suppose an agent has a stereo pair of foveated eyes,
and the capability of tracking maximally correlated image-pairs.
Even looking at a single stereo pair of images, there will be
multiple local correlation maxima, corresponding to objects at
different distances and directions from the agent. Can the agent
learn to use stereo disparity, the difference in axis-offset
of the two retinas, to estimate the distance to the object? Over
what range of distances would this be effective? What would be
required for the agent to learn that this feature represents
- Learning primitive actions for physical robots. Given
sensor traces separated into distinct sensor modalities, replicate
the multidimensional scaling from [P+K,97] that gives the layout of
the sensor array. Then replicate the identification of primitive
actions from [P+K,97] by collecting the average sensory flow vectors
that result from random unit motor vectors. Perform PCA on these
high-dimensional vectors to determine the natural motor primitives
for the physical robot. Apply this to several different physical
- Determine the layout of general distributed sensor
networks. Many network experts are developing methods to
determine the embedding geometry of distributed networks of various
kinds, including our own
Qiu. Can we make progress on this problem, using our methods for
determining sensor array geometry basing inter-pixel distance
estimates on the similarity of the sense information provided?
Review the literature on sensor network geometry, discuss the issue
with Lili Qiu and her students, think deeply about the question, and
show what our methods can contribute to this problem.
- Autonomous learning of sensory features, distinctive states,
and high-level actions. Jefferson Provost's
uses self-organizing maps to learn distinctive sensory features, then
learns hill-climbing control laws to maximize the features at
distinctive states, and trajectory-following control laws to move
from one distinctive state to the neighborhood of the next. His work
is done using a simulated robot. Can you replicate it using a sensor
trace from Lassie? Use the available sensor trace (repeatedly) to
train the SOM. Use the same sensor trace to build an occupancy grid
map of the environment it was exploring. Then use that map in
Player/Stage as the environment for further exploration by a
simulated robot with the same sensor. Use the simulator to get the
large amounts of experience required to train the hill-climber and
trajectory-follower. Then show that the learned high-level actions
can be used to control Lassie's physical travel in the environment.
(This project is very ambitious. If you take it on, we may have to
collect new sensor traces for you, but you can start with the
- Prerequisites and consequences of reliable actions.
Gary Drescher's schema mechanism can be seen as a method for searching
for the prerequisites and consequences of actions, evaluating schemas
for their reliability.
Chaput's PhD thesis, he used hierarchical SOMs to create a much
more tractable replication of Drescher's schema mechanism.
Understand and replicate Chaput's Constructivist Learning
Architecture (CLA). Devise a new simulated-robot learning task, to
evaluate CLA as a way to learn reliable higher-level actions. Does
it complement or compete with SODA?
- Representing directly-perceived quantities. In very low-level
learning and cognition, we face the foundational problem of how to
represent basic directly-perceived quantities, ranging from distance
and angle, to color or texture, to the magnitude of a sensation such
as weight or loudness. What is stored? How can it be retrieved?
One hypothesis is that what is stored is a high-resolution
representation of the actual sensory input. However, this cannot be
used directly to generate a response to a query. Rather, a proposed
response is generated (physically or virtually), starting at some
default value. Only an ordinal comparison (greater, equal, less) is
possible between the stored value and the proposed response. The
proposed response is adjusted until the two "match" (within
reasonable tolerances). This hypothesis appears to match certain
psychophysical experiments. Review the psychophysics literature on
this, create and implement a computational model suited to the
experimental tasks in the literature, and test your model against the
observed results. Does the model make new predictions that could be
Our own work on these problems starts with
[Pierce and Kuipers, AIJ, 1997], which learns the foundations
for the Spatial Semantic Hierarchy
[Kuipers, AIJ, 2000]. It would be helpful to read these in advance.
The following books are valuable reading for the course.
The following books have valuable insights related to this course,
and are well worth reading.
- Bernard J. Baars.
In the Theater of Consciousness.
Oxford University Press, 1997.
Available as a required book at the University Co-op.
- Jean Mandler.
The Foundations of Mind: Origins of Conceptual Thought.
Oxford University Press, 2004.
Quite important; I recommend that you order a copy.
- Mark Johnson.
The Body in the Mind: The Bodily Basis of Meaning,
Imagination, and Reason.
University of Chicago Press, 1987.
- Gary L. Drescher.
Made-Up Minds: A Constructivist Approach
to Artificial Intelligence.
MIT Press, 1991.
- George Lakoff and Mark Johnson.
Metaphors We Live By, second edition.
University of Chicago Press, 2003.
Valuable books for your library
The following are some useful books that you should have in your
professional library, and that are related to this course. I will
assume that you have immediate access to material in these books.
If you do not already have a background in Artificial Intelligence,
the following excellent textbook would be another valuable addition to
your library, and is undoubtedly available used.
- Duda, Hart and Stork. 2001.
Pattern Classification, Second Edition.
NY: John Wiley and Sons.
- Tom Mitchell. 1997. Machine Learning.
(This book is a useful reference, and is the required text for
Ray Mooney's Machine Learning course.)
Some assignment and project may be best done in a high-level
programming environment such as R, MATLAB, or LabVIEW.
Make sure you have any documentation you need.
- Stuart Russell and Peter Norvig. Artificial
Intelligence: A Modern Approach. Prentice-Hall.
The Computer Science Department has a Code of Conduct that describes
the obligations of faculty and students. Read it at