Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

Authors:
Cédric Févotte;Alexey Ozerov
Affiliations:
CNRS, LTCI, Telecom ParisTech, Paris, France;IRISA, INRIA, Rennes, France
Venue:
CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Year:
2010

Citing 10
Cited 1

Non-negative tensor factorization with applications to statistics and computer vision

ICML '05 Proceedings of the 22nd international conference on Machine learning
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Bayesian inference for nonnegative matrix factorisation models

Computational Intelligence and Neuroscience
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Estimating the spatial position of spectral components in audio

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

IEEE Transactions on Audio, Speech, and Language Processing
Convolutive Speech Bases and Their Application to Supervised Speech Separation

IEEE Transactions on Audio, Speech, and Language Processing
Cross Burg entropy maximization and its application to ringing suppression in image reconstruction

IEEE Transactions on Image Processing

Perceptually enhanced blind single-channel music source separation by Non-negative Matrix Factorization

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.