Real-Time speech separation by semi-supervised nonnegative matrix factorization

Authors:
Cyril Joder;Felix Weninger;Florian Eyben;David Virette;Björn Schuller
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, München, Germany;Institute for Human-Machine Communication, Technische Universität München, München, Germany;Institute for Human-Machine Communication, Technische Universität München, München, Germany;HUAWEI Technologies Düsseldorf GmbH, Germany;Institute for Human-Machine Communication, Technische Universität München, München, Germany
Venue:
LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Year:
2012

Citing 6
Cited 1

Detect and track latent factors with online nonnegative matrix factorization

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Supervised and semi-supervised separation of sounds from single-channel mixtures

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation

Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Convolutive Speech Bases and Their Application to Supervised Speech Separation

IEEE Transactions on Audio, Speech, and Language Processing

Application of non-negative spectrogram decomposition with sparsity constraints to single-channel speech enhancement

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an on-line semi-supervised algorithm for real-time separation of speech and background noise. The proposed system is based on Nonnegative Matrix Factorization (NMF), where fixed speech bases are learned from training data whereas the noise components are estimated in real-time on the recent past. Experiments with spontaneous conversational speech and real-life non-stationary noise show that this system performs as well as a supervised NMF algorithm exploiting noise components learned from the same noise environment as the test sample. Furthermore, it outperforms a supervised system trained on different noise conditions.