Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines

Authors:
Tomaso Poggio;Thomas Serre
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Learning a dictionary of shape-components in visual cortex: comparison with neurons, humans and machines
Year:
2006

Citing 0
Cited 14

Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Action Recognition Using a Bio-Inspired Feedforward Spiking Network

International Journal of Computer Vision
FPGA Implementations Comparison of Neuro-cortical Inspired Convolution Processors for Spiking Systems

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
A model for learning topographically organized parts-based representations of objects in visual cortex: Topographic nonnegative matrix factorization

Neural Computation
Patch-based experiments with object classification in video surveillance

ACIVS'07 Proceedings of the 9th international conference on Advanced concepts for intelligent vision systems
A deep-learning model-based and data-driven hybrid architecture for image annotation

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Visual object tracking via sparse reconstruction

ICIMCS '10 Proceedings of the Second International Conference on Internet Multimedia Computing and Service
Visual object tracking via sample-based Adaptive Sparse Representation (AdaSR)

Pattern Recognition
Beyond sparsity: The role of L1-optimizer in pattern classification

Pattern Recognition
Action recognition via bio-inspired features: The richness of center-surround interaction

Computer Vision and Image Understanding
On cortex mechanism hierarchy model for facial expression recognition: multi-database evaluation results

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II
Letters: Enhancing sparsity via ℓp (0

Neurocomputing
A spiking neural network based cortex-like mechanism and application to facial expression recognition

Computational Intelligence and Neuroscience
Pose invariant face recognition using biological inspired features based on ensemble of classifiers

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this thesis, I describe a quantitative model that accounts for the circuits and computations of the feedforward path of the ventral stream of visual cortex. This model is consistent with a general theory of visual processing that extends the hierarchical model of [Hubel and Wiesel, 1959] from primary to extrastriate visual areas. It attempts to explain the first few hundred milliseconds of visual processing and "immediate recognition". One of the key elements in the approach is the learning of a generic dictionary of shape-components from V2 to IT, which provides an invariant representation to task-specific categorization circuits in higher brain areas. This vocabulary of shape-tuned units is learned in an unsupervised manner from natural images, and constitutes a large and redundant set of image features with different complexities and invariances. This theory significantly extends an earlier approach by [Riesenhuber and Poggio, 1999a] and builds upon several existing neurobiological models and conceptual proposals. First, I present evidence to show that the model can duplicate the tuning properties of neurons in various brain areas (e.g., V1, V4 and IT). In particular, the model agrees with data from V4 about the response of neurons to combinations of simple two-bar stimuli [Reynolds et al., 1999] (within the receptive field of the S2 units) and some of the C2 units in the model show a tuning for boundary conformations which is consistent with recordings from V4 [Pasupathy and Connor, 2001]. Second, I show that not only can the model duplicate the tuning properties of neurons in various brain areas when probed with artificial stimuli, but it can also handle the recognition of objects in the real-world, to the extent of competing with the best computer vision systems. Third, I describe a comparison between the performance of the model and the performance of human observers in a rapid animal vs. non-animal recognition task for which recognition is fast and cortical back-projections are likely to be inactive. Results indicate that the model predicts human performance extremely well when the delay between the stimulus and the mask is about 50 ms. This suggests that cortical back-projections may not play a significant role when the time interval is in this range, and the model may therefore provide a satisfactory description of the feedforward path. Taken together, the evidences suggest that we may have the skeleton of a successful theory of visual cortex. In addition, this may be the first time that a neurobiological model, faithful to the physiology and the anatomy of visual cortex, not only competes with some of the best computer vision systems thus providing a realistic alternative to engineered artificial vision systems, but also achieves performance close to that of humans in a categorization task involving complex natural images. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)