The use of attention and spatial information for rapid facial recognition in video

Authors:
J. Bonaiuto;L. Itti
Affiliations:
Neuroscience Department, University of Southern California Los Angeles, CA 90089, USA;Neuroscience Department, University of Southern California Los Angeles, CA 90089, USA
Venue:
Image and Vision Computing
Year:
2006

Citing 6
Cited 4

A Model of Saliency-Based Visual Attention for Rapid Scene Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Attentional Selection for Object Recognition A Gentle Way

BMCV '02 Proceedings of the Second International Workshop on Biologically Motivated Computer Vision
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Context-Based Detection of Keypoints and Features in Eye Regions

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
Is bottom-up attention useful for object recognition?

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Dynamic visual attention model in image sequences

Image and Vision Computing
A robust approach to segment desired object based on salient colors

Journal on Image and Video Processing - Color in Image and Video Processing
Object recognition and segmentation in videos by connecting heterogeneous visual features

Computer Vision and Image Understanding
Editorial: Seeing faces in video by computers. Editorial for Special Issue on Face Processing in Video Sequences

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bottom-up visual attention allows primates to quickly select regions of an image that contain salient objects. In artificial systems, restricting the task of object recognition to these regions allows faster recognition and unsupervised learning of multiple objects in cluttered scenes. A problem with this approach is that objects superficially dissimilar to the target are given the same consideration in recognition as similar objects. In video, objects recognized in previous frames at locations distant to the current fixation point are given the same consideration in recognition as objects previously recognized in locations closer to the current target of attention. Due to the continuity of smooth motion, objects recently recognized in previous frames at locations close to the current focus of attention have a high probability of matching the current target. Here we investigate rapid pruning of the facial recognition search space using the already-computed low-level features that guide attention and spatial information derived from previous video frames. For each video frame, Itti & Koch's bottom-up visual attention algorithm is used to select salient locations based on low-level features such as contrast, orientation, color, intensity, flicker and motion. This algorithm has shown to be highly effective in selecting faces as salient objects. Lowe's SIFT object recognition algorithm then extracts a signature of the attended object, for comparison with the facial database. The database search is prioritized for faces which better match the low-level features used to guide attention to the current candidate for recognition or those that were previously recognized near the current candidate's location. The SIFT signatures of the prioritized faces are then checked against the attended candidate for a match. By comparing performance of Lowe's recognition algorithm and Itti & Koch's bottom-up attention model with or without search space pruning we demonstrate that our pruning approach improves the speed of facial recognition in video footage.