Cross-Modality Automatic Face Model Training from Large Video Databases

Authors:
Xiaodan Song;Ching-Yung Lin;Ming-Ting Sun
Affiliations:
University of Washington, Seattle, WA;IBM T.J. Watson Research Center, Hawthorne, NY;University of Washington, Seattle, WA
Venue:
CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 5 - Volume 05
Year:
2004

Citing 0
Cited 3

Autonomous visual model building based on image crawling through internet search engines

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Multiple instance learning for labeling faces in broadcasting news video

Proceedings of the 13th annual ACM international conference on Multimedia
Identity Management in Face Recognition Systems

Biometrics and Identity Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Face recognition is an important issue on video indexing and retrieval applications. Usually, supervised learning is used to build face models for various specific named individuals. However, a huge amount of labeling work is needed in a traditional supervised learning framework. In this paper, we propose an automatic cross-modality training scheme without supervision which uses automatic speech recognition of videos to build visual face models. Based on Multiple-Instance Learning algorithms, we introduce novel concepts of "Quasi-Positive bags" and "Extended Diverse Density", and use them to develop an automatic training scheme. We also propose to use the "Relative Sparsity" of a cluster to detect the anchorperson in the news videos. Experiments show that our algorithm can get correct models for the persons we are interested in. The automatic learned models are tested and compared with a supervised learning algorithm for face recognition in large news video databases, and show promising results.