A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture

  • Authors:
  • Simon Arberet;Rémi Gribonval;Frédéric Bimbot

  • Affiliations:
  • IRISA, France;IRISA, France;IRISA, France

  • Venue:
  • ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a robust method to estimate the number of audio sources and the mixing matrix in a linear instantaneous mixture, even with more sources than sensors. Our method is based on a multiscale Short Time Fourier Transform (STFT), and relies on the assumption that in the neighborhood of some (unknown) scales and time-frequency points, only one source contributes to the mixture. Such time-frequency regions provide local estimates of the corresponding columns of the mixing matrix. Our main contribution is a new clustering algorithm called DEMIX to estimate the number of sources and the mixing matrix based on such local estimates. In contrast to DUET or other similar sparsity-based algorithms, which rely on a global scatter plot, our algorithm exploits a local confidence measure to weight the influence of each time-frequency point in the estimated matrix. Inspired by the work of Deville, the confidence measure relies on the time-frequency local persistence of the activity/inactivity of each source. Experiments are provided with stereophonic mixtures and show the improved performance of DEMIX compared to K-means or ELBG clustering algorithms.