Multimodal Sparse Features for Object Detection

Authors:
Martin Haker;Thomas Martinetz;Erhardt Barth
Affiliations:
Institute for Neuro- and Bioinformatics, University of Lübeck, Lübeck, Germany 23538;Institute for Neuro- and Bioinformatics, University of Lübeck, Lübeck, Germany 23538;Institute for Neuro- and Bioinformatics, University of Lübeck, Lübeck, Germany 23538
Venue:
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Year:
2009

Citing 8
Cited 1

On Importance of Nose for Face Tracking

FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Learning Overcomplete Representations

Neural Computation
Nose shape estimation and tracking for model-based coding

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Gesture recognition with a Time-Of-Flight camera

International Journal of Intelligent Systems Technologies and Applications
Sparse Coding Neural Gas: Learning of overcomplete data representations

Neurocomputing
Stable recovery of sparse overcomplete representations in the presence of noise

IEEE Transactions on Information Theory
Simple Method for High-Performance Digit Recognition Based on Sparse Coding

IEEE Transactions on Neural Networks

Deictic gestures with a time-of-flight camera

GW'09 Proceedings of the 8th international conference on Gesture in Embodied Communication and Human-Computer Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper the sparse coding principle is employed for the representation of multimodal image data, i.e. image intensity and range. We estimate an image basis for frontal face images taken with a Time-of-Flight (TOF) camera to obtain a sparse representation of facial features, such as the nose. These features are then evaluated in an object detection scenario where we estimate the position of the nose by template matching and a subsequent application of appropriate thresholds that are estimated from a labeled training set. The main contribution of this work is to show that the templates can be learned simultaneously on both intensity and range data based on the sparse coding principle, and that these multimodal templates significantly outperform templates generated by averaging over a set of aligned image patches containing the facial feature of interest as well as multimodal templates computed via Principal Component Analysis (PCA). The system achieves a detection rate of 96.4% on average with a false positive rate of 3.7%.