VISOR: Learning Visual Schemas In Neural Networks For Object Recognition And Scene Analysis (1994)
This dissertation describes a neural network system called VISOR for object recognition and scene analysis. The research with VISOR aims at three general goals: (1) to contribute to building robust, general vision systems that can be adapted to different applications, (2) to contribute to a better understanding of the human visual system by modeling high-level perceptual phenomena, and (3) to address several fundamental problems in neural network implementation of intelligent systems, including resource-limited representation, and representing and learning structured knowledge. These goals lead to a schema-based approach to visual processing, and focus the research on the representation and learning of visual schemas in neural networks. Given an input scene, VISOR focuses attention at one component of an object at a time, and extracts the shape and position of the component. The schemas, represented in a hierarchy of maps and connections between them, cooperate and compete to determine which one best matches the input. VISOR keeps shifting attention to other parts of the scene, reusing the same schema representations to identify the objects one at a time, eventually recognizing what the scene depicts. The recognition result consists of labels for the objects and the entire scene. VISOR also learns to encode the schemas' spatial structures through unsupervised modification of connection weights, and reinforcement feedback from the environment is used to determine whether to adapt existing schemas or create new schemas to represent novel inputs. VISOR's operation is based on cooperative, competitive, and parallel bottom-up and top-down processes that seem to underlie many human perceptual phenomena. Therefore, VISOR can provide a computational account of many such phenomena, including shifting of attention, priming effect, perceptual reversal, and circular reaction, and may lead to a better understanding of how these processes are carried out in the human visual system. Compared to traditional rule-based systems, VISOR shows remarkable robustness of recognition, and is able to indicate the confidence of its analysis as the inputs differ increasingly from the schemas. With such properties, VISOR is a promising first step towards a general vision system that can be used in different applications after learning the application-specific schemas.
PhD Thesis, Department of Computer Sciences, The University of Texas at Austin. 198 pages. Technical Report AI-94-219.

Wee Kheng Leow Ph.D. Alumni leowwk [at] comp nus edu sg