An efficient method for face retrieval from large video datasets

  • Authors:
  • Thao Ngoc Nguyen;Thanh Duc Ngo;Duy-Dinh Le;Shin'ichi Satoh;Bac Hoai Le;Duc Anh Duong

  • Affiliations:
  • University of Science, Ho Chi Minh City, Vietnam;The Graduate University for Advanced Studies, Chiyoda-ku, Tokyo, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;National Institute of Informatics, Chiyoda-ku, Tokyo, Japan;University of Science, Ho Chi Minh City, Vietnam;University of Science, Ho Chi Minh City, Vietnam

  • Venue:
  • Proceedings of the ACM International Conference on Image and Video Retrieval
  • Year:
  • 2010
  • The video face book

    MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling

Quantified Score

Hi-index 0.00

Visualization

Abstract

The human face is one of the most important objects in videos since it provides rich information for spotting certain people of interest, such as government leaders in news video, or the hero in a movie, and is the basis for interpreting facts. Therefore, detecting and recognizing faces appearing in video are essential tasks of many video indexing and retrieval applications. Due to large variations in pose changes, illumination conditions, occlusions, hairstyles, and facial expressions, robust face matching has been a challenging problem. In addition, when the number of faces in the dataset is huge, e.g. tens of millions of faces, a scalable method for matching is needed. To this end, we propose an efficient method for face retrieval in large video datasets. In order to make the face retrieval robust, the faces of the same person appearing in individual shots are grouped into a single face track by using a reliable tracking method. The retrieval is done by computing the similarity between face tracks in the databases and the input face track. For each face track, we select one representative face and the similarity between two face tracks is the similarity between their two representative faces. The representative face is the mean face of a subset selected from the original face track. In this way, we can achieve high accuracy in retrieval while maintaining low computational cost. For experiments, we extracted approximately 20 million faces from 370 hours of TRECVID video, of which scale has never been addressed by the former attempts. The results evaluated on a subset consisting of manually annotated 457,320 faces show that the proposed method is effective and scalable.