Next: Acknowledgements Up: Face Recognition by Dynamic Previous: Results

Discussion and Conclusion

The model for visual object recognition we are presenting here marks the extreme end of a scale, relying minimally on pre-existing structure. In fact, all it needs is some natural intracortical connection patterns, one stored example for each object to be recognized, and a simple mechanism of on-line self-organization in the form of rapid reversible synaptic plasticity. This distinguishes it from many alternative neural models for object recognition, which require extensive control structures [2] or specific feature hierarchies, to be created by training [5,10], before the first object can be recognized. The lateral connections within the image domain and the model domain of our system encode the a priori constraint of conservation of spatial continuity during the match. The match itself is realized with the help of the rapid self-organization of the synaptic connections between image and models. This self-organization is controlled by signal correlations and by feature similarity between image points and model points. For each object to be recognized just a single model needs to be stored, which can be done with the help of simple mechanisms of associative memory [18]. (For the accommodation of substantial rotation in depth the object needs to be inspected from many angles and the resulting models need to be fused into one model graph, a principle demonstrated by [11].) From these properties of our system results a very clear-cut message concerning the issue of intracortical connections: visual object recognition can be understood on the basis of simple connectivity structures and mechanisms of plasticity that are already known today or at least are well within the reach of existing experimental techniques!

Our model leaves open a number of questions regarding the structure of lateral connections, especially in the model domain. The global interaction between models (our ``recognition dynamics'') could be realized with the help of a single cardinal cell per model, or it could take the form of a distributed set of connections between model neurons (as formulated in [18]). More work is required to decide this issue. The anatomy of the local interaction between models, second term on the right-hand side of Equation 1, can only be discussed after the relative anatomical placement of different models has become clear. Also, the extent and the nature of the overlap between models in terms of common neurons and common connections must be clarified first. Two extreme versions are imaginable, (1) models are laid down in mutual register in terms of internal position, and (2) there is a fixed spatial array of feature types in infero-temporal cortex (for which there is faint experimental evidence [15]), and laying down a model consists in selecting appropriate feature cells and connecting these as required by the inner structure of the model. In the first case, the lateral model connections would be tidy and local within the cortical tissue (at least their excitatory part), in the second they would form a diffuse fiber plexus without any apparent anatomical structure. A further aspect of intracortical connectivity that we are totally ignoring in the present system concerns intra-hypercolumnar connectivity. This is implicitly present, being required to organize the necessary feature specificity, and probably also for the evaluation of the feature similarity between a pair of hypercolumns (``nodes'') in image and model.

Last, and by no means least, we have given short shrift to the issue of inter-areal organization of connections, by lumping all primary areas into one image domain and all infero-temporal areas into one model domain. Within the image domain, two extreme views could be taken. i) The different areas (V1, V2, V4, for instance) represent different mixtures of feature specificities and are tied together by rigid retinotopically organized connections. In that case areal structure could be ignored for the purposes of our present system, and neurons in different areas but subserving the same retinal point could just be lumped together into one ``hyper-hypercolumn.'' ii) The synaptic projection systems between areas are substantially reorganized during the recognition process, areas perhaps forming sequential layers connecting V1 indirectly with IT, as proposed in [2]. Perhaps such an indirect connectivity scheme can reduce the enormous number of fibers required by our system for connecting any pair of points in image and models.

There is one apparent mismatch between our system and the reality of object recognition in the brain of adults: the time taken by the process. There are reports that objects of different type can be distinguished by human subjects in less than a tenth of a second [14]. In contrast, our system requires for the process many hundred sequential steps. It is not easy to interpret these sequential steps in terms of biological real time. The essential parameter seems, however, to be the temporal resolution with which signal correlations can be evaluated in our brain. This issue is at present under heated discussion [12,13], but there is little hope that this resolution is better than one or a few milliseconds. In this case the hundreds of sequential steps required by our system translate into many hundred milliseconds, which is unrealistically long. Dynamic Link Matching needs this time to reduce the enormous ambiguity in the feature similarities between image and object points to a sparse set of connections between corresponding points. If this ambiguity could be decisively reduced with the help of highly specific feature types (which in an extreme case were private to one object type), recognition time could be cut drastically. The feature types we are using, Gabor-based wavelets, are very general and unspecific. It is likely that highly specific features can only be generated by a learning mechanism. It is our view that the basic mechanism of our system is used by the young animal to store and recognize objects early in its life. At first, each recognition process may take seconds, but the mechanism can be the basis for very efficient learning of specific feature types, a process that due to the Dynamic Link Mechanism is not hampered by confusion between different objects.

The most encouraging aspect of our system is its evident capability to solve the invariant object recognition problem in spite of all the difficulties and adversities posed by real images and in spite of large numbers and great structural overlap of objects to be distinguished. This puts it in sharp contrast to proposed recognition mechanisms that work only on simple toy examples. We therefore feel that this system is a foot in the door, and its remaining difficulties can be solved gradually. What is important in the context of the present book is the light our system sheds on the functional role of lateral connections in visual cortex.

Next: Acknowledgements Up: Face Recognition by Dynamic Previous: Results