Cross-Modal Localization via Sparsity

Authors:
E. Kidron;Y.Y. Schechner;M. Elad
Affiliations:
Dept. of Electr. Eng., Technion-Israel Inst. Technol., Haifa;-;-
Venue:
IEEE Transactions on Signal Processing
Year:
2007

Citing 0
Cited 3

Video coding based on audio-visual attention

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Audio-visual group recognition using diffusion maps

IEEE Transactions on Signal Processing
Efficient video coding based on audio-visual focus of attention

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	35.69

Visualization

Abstract

Cross-modal analysis is a natural progression beyond processing of single-source signals. Simultaneous processing of two sources can reveal information that is unavailable when handling the sources separately. Indeed, human and animal perception, computer vision, weather forecasting, and various other scientific and technological fields can benefit from such a paradigm. A particular cross-modal problem is localization: out of the entire data array originating from one source, localize the components that best correlate with the other. For example, auditory and visual data sampled from a scene can be used to localize visual events associated with the sound track. In this paper we present a rigorous analysis of fundamental problems associated with the localization task. We then develop an approach that leads efficiently to a unique, high definition localization outcome. Our method is based on canonical correlation analysis (CCA), where inherent ill-posedness is removed by exploiting sparsity of cross-modal events. We apply our approach to localization of audio-visual events. The proposed algorithm grasps such dynamic audio-visual events with high spatial resolution. The algorithm effectively detects the pixels that are associated with sound, while filtering out other dynamic pixels, overcoming substantial visual distractions and audio noise. The algorithm is simple and efficient thanks to its reliance on linear programming, while being free of user-defined parameters