Finding Important People in Large News Video Databases Using Multimodal and Clustering Analysis

  • Authors:
  • Duy-Dinh Le;Shin'ichi Satoh;Michael E. Houle;Dat Phuoc Tat Nguyen

  • Affiliations:
  • National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 101-8430. ledduy@nii.ac.jp;National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 101-8430. satoh@nii.ac.jp;National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan 101-8430. meh@nii.ac.jp;The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan 113-8656. nptdat@mi.ci.i.u-tokyo.ac.jp

  • Venue:
  • ICDEW '07 Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The wide availability of large scale databases requires more efficient and scalable tools for data understanding and knowledge discovery. In this paper, we present a method to find important people who have appeared repeatedly in a certain time period from large news video databases. Specifically, we investigate two issues: how to group similar faces to find dominant groups and how to label these groups by the corresponding names for identification. These are challenging problems because firstly people can appear with large appearance variations such as hair styles, illumination conditions and poses that make comparing between similar faces more difficult; secondly, the number of people and their occurrence frequencies that are unknown make finding dominant and useful groups more complicated; and finally, the fact that in news video faces and names usually do not appear together can make troubles in aligning faces and names. To handle above problems, we propose using the relevant set correlation based clustering model which can efficiently handle dataset of millions of objects represented in thousands or even millions of dimensions to find groups of similar faces from the large and noisy face dataset. Then in order to identify faces in clusters, names extracted from the transcripts are filtered and used to find the best correspondences by using methods developed in the statistical machine translation literature. Experiments on large video datasets containing hundreds of hours showed that our system can efficiently find out important people by not only their appearance but also their identification.