Common sense, and hence most other human knowledge, is built on
knowledge of a few foundational domains, such as space, time, action,
objects, causality, and so on. We are investigating how this knowledge
can be learned from unsupervised sensorimotor experience. We assume
that an agent, human or robot, starts with a low-level ontology for
describing its sensorimotor interaction with the world. We call this
the "pixel level". William James called it the "blooming buzzing
confusion". The learning task is to create useful higher-level
representations for space, time, actions, objects, etc, to support
effective planning and action in the world.
The basic idea behind bootstrap learning is to compose multiple
machine learning methods, using weak but general unsupervised or
delayed-reinforcement learning methods to create the prerequisites for
applying stronger but more specific learning methods such as abductive
inference or supervised learning. An important common theme of all
this work is the learning of a higher level ontology of places,
objects, and their relationships, based on the low-level "pixel
ontology" of direct experience. These learning methods create new
symbols and categories, solving the symbol grounding problem for these
symbols, and defining the symbols in terms of the agent's own
experience, not the experience of an external teacher or programmer.