Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

Authors:
Alexey Ozerov;Cédric Févotte
Affiliations:
METISS Team of IRISA, INRIA, Rennes Cedex, France and Institut Telecom, Telecom ParisTech, CNRS, LTCI, Paris, France;CNRS, LTCI, Telecom ParisTech, Paris, France
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 18
Cited 15

Fundamentals of statistical signal processing: estimation theory

Fundamentals of statistical signal processing: estimation theory
Independent factor analysis

Neural Computation
Positive tensor factorization

Pattern Recognition Letters
Non-negative tensor factorization with applications to statistics and computer vision

ICML '05 Proceedings of the 22nd international conference on Machine learning
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Multichannel nonnegative matrix factorization in convolutive mixtures. With application to blind audio source separation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A flexible component model for precision ICA

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Supervised and semi-supervised separation of sounds from single-channel mixtures

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Complex nonconvex lp norm minimization for underdetermined source separation

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Csiszár’s divergences for non-negative matrix factorization: family of new algorithms

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
A robust method to count and locate audio sources in a stereophonic linear instantaneous mixture

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Estimating the spatial position of spectral components in audio

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Bayesian regularization and nonnegative deconvolution for room impulse response estimation

IEEE Transactions on Signal Processing
Blind separation of instantaneous mixtures of nonstationary sources

IEEE Transactions on Signal Processing
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

IEEE Transactions on Audio, Speech, and Language Processing
Convolutive Speech Bases and Their Application to Supervised Speech Separation

IEEE Transactions on Audio, Speech, and Language Processing
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

IEEE Transactions on Audio, Speech, and Language Processing

Under-determined reverberant audio source separation using a full-rank spatial covariance model

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
A general modular framework for audio source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
The 2010 signal separation evaluation campaign (SiSEC2010): audio source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Statistical model of speech signals based on composite autoregressive system with application to blind source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Informed source separation using latent components

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Learning the Morphological Diversity

SIAM Journal on Imaging Sciences
Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Convolutive nonnegative matrix factorization with Markov random field smoothing for blind unmixing of multichannel speech recordings

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
A tractable framework for estimating and combining spectral source models for audio source separation

Signal Processing
The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges

Signal Processing
Informed source separation through spectrogram coding and data embedding

Signal Processing
Multi-source TDOA estimation in reverberant audio using angular spectra and clustering

Signal Processing
A probability-based combination method for unsupervised clustering with application to blind source separation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
A GMM sound source model for blind speech separation in under-determined conditions

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).