Audiovisual Gestalts

  • Authors:
  • Gianluca Monaci;Pierre Vandergheynst

  • Affiliations:
  • Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland

  • Venue:
  • CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions over redundant dictionaries of functions. In this way, it is possible to define perceptually meaningful audiovisual events. The detection of these cross-modal structures is done using a simple rule called Helmholtz principle. Experimental results show that extracting significant synchronous audiovisual events, we can detect the existing cross-modal correlation between those signals even in presence of distracting motion and acoustic noise. These results confirm that temporal proximity between audiovisual events is a key ingredient for the integration of information across modalities and that it can be effectively exploited for the design of multi-modal analysis algorithms.