Model-based expectation-maximization source separation and localization

Authors:
Michael I. Mandel;Ron J. Weiss;Daniel P. W. Ellis
Affiliations:
Department of Electrical Engineering, Columbia University, New York, NY;Department of Electrical Engineering, Columbia University, New York, NY;Department of Electrical Engineering, Columbia University, New York, NY
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 6
Cited 8

Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 05
Fast communication: Perceptual evaluation of blind source separation for robust speech recognition

Signal Processing
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Mask estimation for missing data speech recognition based on statistics of binaural interaction

IEEE Transactions on Audio, Speech, and Language Processing
Self-localizing dynamic microphone arrays

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Evaluating source separation algorithms with reverberant speech

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Combining localization cues and source model constraints for binaural source separation

Speech Communication
The cocktail party robot: sound source separation and localisation with an active binaural head

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
A latently constrained mixture model for audio source separation and localization

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking

Computer Speech and Language
Sparse coding with adaptive dictionary learning for underdetermined blind speech separation

Speech Communication
Modulation domain blind speech separation in noisy environments

Speech Communication
Bayesian Nonparametrics for Microphone Array Processing

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources.We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-todistortion ratio 1.6 dB greater and Perceptual Evaluation of Speech Quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.