Adaptive speaker identification with audiovisual cues for movie content analysis

  • Authors:
  • Ying Li;Shrikanth S. Narayanan;C.-C. Jay Kuo

  • Affiliations:
  • Integrated Media Systems Center, Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Integrated Media Systems Center, Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Integrated Media Systems Center, Department of Electrical Engineering, University of Southern California, Los Angeles, CA

  • Venue:
  • Pattern Recognition Letters - Video computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

An adaptive speaker identification system which employs both audio and visual cues is proposed in this work for movie content analysis. Specifically, a likelihood-based approach is first applied for speaker identification using pure speech data, and techniques such as face detection/recognition and mouth tracking are applied for talking face recognition using pure visual data. These two information cues are then effectively integrated under a probabilistic framework for achieving more robust results. Moreover, to account for speakers' voice variations along time, we propose to update their acoustic models on the fly by adapting to their incoming speech data. An improved system performance (80% identification accuracy) has been observed on two test movies.