A tractable framework for estimating and combining spectral source models for audio source separation

Authors:
Simon Arberet;Alexey Ozerov;FréDéRic Bimbot;RéMi Gribonval
Affiliations:
Institute of Electrical Engineering, ícole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;INRIA, Rennes Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes cedex, France;IRISA, CNRS-UMR 6074, Campus de Beaulieu, 35042 Rennes cedex, France;INRIA, Rennes Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes cedex, France
Venue:
Signal Processing
Year:
2012

Citing 19
Cited 1

Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Independent factor analysis

Neural Computation
Bayesian parameter estimation via variational methods

Statistics and Computing
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Blind Spectral-GMM Estimation for Underdetermined Instantaneous Audio Source Separation

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Underdetermined Instantaneous Audio Source Separation via Local Gaussian Modeling

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Complex nonconvex lp norm minimization for underdetermined source separation

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
A robust method to count and locate audio sources in a multichannel underdetermined mixture

IEEE Transactions on Signal Processing
A general modular framework for audio source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
On circularity

IEEE Transactions on Signal Processing
Blind source separation based on time-frequency signalrepresentations

IEEE Transactions on Signal Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Audio source separation with a single sensor

IEEE Transactions on Audio, Speech, and Language Processing
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

IEEE Transactions on Audio, Speech, and Language Processing
A Bayesian Approach for Blind Separation of Sparse Sources

IEEE Transactions on Audio, Speech, and Language Processing

Automatic music transcription: challenges and future directions

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.08

Visualization

Abstract

The underdetermined blind audio source separation (BSS) problem is often addressed in the time-frequency (TF) domain assuming that each TF point is modeled as an independent random variable with sparse distribution. On the other hand, methods based on structured spectral model, such as the Spectral Gaussian Scaled Mixture Models (Spectral-GSMMs) or Spectral Non-negative Matrix Factorization models, perform better because they exploit the statistical diversity of audio source spectrograms, thus allowing to go beyond the simple sparsity assumption. However, in the case of discrete state-based models, such as Spectral-GSMMs, learning the models from the mixture can be computationally very expensive. One of the main problems is that using a classical Expectation-Maximization procedure often leads to an exponential complexity with respect to the number of sources. In this paper, we propose a framework with a linear complexity to learn spectral source models (including discrete state-based models) from noisy source estimates. Moreover, this framework allows combining different probabilistic models that can be seen as a sort of probabilistic fusion. We illustrate that methods based on this framework can significantly improve the BSS performance compared to the state-of-the-art approaches.