Audiovisual Gestalts

Authors:
Gianluca Monaci;Pierre Vandergheynst
Affiliations:
Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland;Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland
Venue:
CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Year:
2006

Citing 0
Cited 2

Visual localization of non-stationary sound sources

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Geometric video approximation using weighted matching pursuit

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions over redundant dictionaries of functions. In this way, it is possible to define perceptually meaningful audiovisual events. The detection of these cross-modal structures is done using a simple rule called Helmholtz principle. Experimental results show that extracting significant synchronous audiovisual events, we can detect the existing cross-modal correlation between those signals even in presence of distracting motion and acoustic noise. These results confirm that temporal proximity between audiovisual events is a key ingredient for the integration of information across modalities and that it can be effectively exploited for the design of multi-modal analysis algorithms.