Glimpsing IVA: a framework for overcomplete/complete/undercomplete convolutive source separation

  • Authors:
  • Alireza Masnadi-Shirazi;Wenyi Zhang;Bhaskar D. Rao

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA;Bloomberg L.P., New York, NY and Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA;Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Independent vector analysis (IVA) is a method for separating convolutedly mixed signals that significantly reduces the occurrence of the well-known permutation problem in frequency domain blind source separation (BSS). In this paper, we develop a novel IVA-based unifying framework for overcomplete/complete/ undercomplete convolutive noisy BSS. We show that in order for the sources to be separable in the frequency domain, they must have a temporal dynamic structure. We exploit a common form of dynamics, especially present in speech, wherein the signals have silence periods intermittently, hence varying the set of active sources with time. This feature is extremely useful in dealing with overcomplete situations. An approach using hidden Markov models (HMMs) is proposed that takes advantage of different combinations of silence gaps of the source signals at each time period. This enables the algorithm to "glimpse" or listen in the gaps, hence compensating for the global degeneracy by allowing it to learn the mixing matrices at periods where it is locally less degenerate. The same glimpsing strategy can be employed to the complete/under-complete case as well. Moreover, additive noise is considered in our model. Real and simulated experiments were carried out for overcomplete convoluted mixtures of speech signals yielding improved separation results compared to a sparsity-based robust time-frequency masking method. Signal-to-disturbance ratio (SDR) and machine intelligibility of a speech recognizer was used to evaluate their performances. Experiments were also conducted for the classical complete setting using the proposed algorithm and compared with standard IVA showing that the results compare favorably.