Under-determined reverberant audio source separation using a full-rank spatial covariance model

Authors:
Ngoc Q. K. Duong;Emmanuel Vincent;Rémi Gribonval
Affiliations:
INRIA, Centre Inria Rennes-Bretagne Atlantique, Rennes Cedex, France;INRIA, Centre Inria Rennes-Bretagne Atlantique, Rennes Cedex, France;INRIA, Centre Inria Rennes-Bretagne Atlantique, Rennes Cedex, France
Venue:
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Year:
2010

Citing 10
Cited 7

Map-based underdetermined blind source separation of convolutive mixtures by hierarchical clustering and l1-norm minimization

EURASIP Journal on Applied Signal Processing
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Stereo Source Separation and Source Counting with MAP Estimation with Dirichlet Prior Considering Spatial Aliasing Problem

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Speech separation using speaker-adapted eigenvoice speech models

Computer Speech and Language
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation

IEEE Transactions on Audio, Speech, and Language Processing

Under-determined reverberant audio source separation using local observed covariance and auditory-motivated time-frequency representation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Multi-source TDOA estimation in reverberant audio using angular spectra and clustering

Signal Processing
A general framework for online audio source separation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
A GMM sound source model for blind speech separation in under-determined conditions

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Robust speech recognition based on binaural speech enhancement system as a preprocessing step

Proceedings of the Third Symposium on Information and Communication Technology
Tracking of multidimensional TDOA for multiple sources with distributed microphone pairs

Computer Speech and Language
Bayesian Nonparametrics for Microphone Array Processing

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the modeling of reverberant recording environments in the context of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial characteristics of the source. We then consider four specific covariance models, including a full-rank unconstrained model. We derive a family of iterative expectation-maximization (EM) algorithms to estimate the parameters of each model and propose suitable procedures adapted from the state-of-the-art to initialize the parameters and to align the order of the estimated sources across all frequency bins. Experimental results over reverberant synthetic mixtures and live recordings of speech data show the effectiveness of the proposed approach.