Audio-visual group recognition using diffusion maps

  • Authors:
  • Yosi Keller;Ronald R. Coifman;Stéphane Lafon;Steven W. Zucker

  • Affiliations:
  • School of Engineering, Bar Ilan University, Israel;Department of Mathematics, Yale University, New Haven, CT;Google Inc., Mountain View, CA;Department of Computer Science, Yale University, New Haven, CT

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 2010
  • Joint manifolds for data fusion

    IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation

Quantified Score

Hi-index 35.68

Visualization

Abstract

Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle.We propose a unified embedding scheme for multisensory data,based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates.To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.