Recovery of audio-to-video synchronization through analysis of cross-modality correlation

  • Authors:
  • Yuyu Liu;Yoichi Sato

  • Affiliations:
  • Institute of Industrial Science, The University of Tokyo, Tokyo;Institute of Industrial Science, The University of Tokyo, Tokyo

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.10

Visualization

Abstract

Audio-to-video synchronization (AV-sync) may drift and is difficult to recover without time-consuming efforts. Based on analysis of audiovisual correlations, we developed a method of recovering drifted AV-sync in a video clip with only minor human interactions. Users just need to specify the time window for a stationary speaker. We search the optimum drift within this time window that maximizes the average audiovisual correlation inside the speaker region by shifting audio and computing the correlation for different drift hypotheses, and then recover AV-sync based on the refined optimum drift. The audiovisual correlation was analyzed by Quadratic Mutual Information with Kernel Density Estimation, which is not only robust against audiovisual changes in scale, but also independent of the language. The experimental results demonstrated that our method could effectively recover audio-to-video synchronization. A preliminary version of this work was reported at the 2008 IAPR Conference on Pattern Recognition (Liu and Sato, 2008) and won the Best Industry Related Paper Award (BIRPA).