Visual localization of non-stationary sound sources

  • Authors:
  • Yuyu Liu;Yoichi Sato

  • Affiliations:
  • The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan

  • Venue:
  • MM '09 Proceedings of the 17th ACM international conference on Multimedia
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sound source can be visually localized by analyzing the correlation between audio and visual data. To correctly analyze this correlation, the sound source is required to be stationary in a scene to date. We introduce a technique that localizes the non-stationary sound sources to overcome this limitation. The problem is formulated as finding the optimal visual trajectories that best represent the movement of the sound source over the pixels in a spatio-temporal volume. Using a beam search, we search these optimal visual trajectories by maximizing the correlation between the newly introduced audiovisual features of inconsistency. An incremental correlation evaluation with mutual information is developed here, which significantly reduces the computational cost. The correlations computed along the optimal trajectories are finally incorporated into a segmentation technique to localize a sound source region in the first visual frame of the current time window. Experimental results demonstrate the effectiveness of our method.