Cross-Modal Localization via Sparsity

  • Authors:
  • E. Kidron;Y.Y. Schechner;M. Elad

  • Affiliations:
  • Dept. of Electr. Eng., Technion-Israel Inst. Technol., Haifa;-;-

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 2007

Quantified Score

Hi-index 35.69

Visualization

Abstract

Cross-modal analysis is a natural progression beyond processing of single-source signals. Simultaneous processing of two sources can reveal information that is unavailable when handling the sources separately. Indeed, human and animal perception, computer vision, weather forecasting, and various other scientific and technological fields can benefit from such a paradigm. A particular cross-modal problem is localization: out of the entire data array originating from one source, localize the components that best correlate with the other. For example, auditory and visual data sampled from a scene can be used to localize visual events associated with the sound track. In this paper we present a rigorous analysis of fundamental problems associated with the localization task. We then develop an approach that leads efficiently to a unique, high definition localization outcome. Our method is based on canonical correlation analysis (CCA), where inherent ill-posedness is removed by exploiting sparsity of cross-modal events. We apply our approach to localization of audio-visual events. The proposed algorithm grasps such dynamic audio-visual events with high spatial resolution. The algorithm effectively detects the pixels that are associated with sound, while filtering out other dynamic pixels, overcoming substantial visual distractions and audio noise. The algorithm is simple and efficient thanks to its reliance on linear programming, while being free of user-defined parameters