Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

Authors:
Tuomas Virtanen
Affiliations:
Tampere Univ. of Technol.
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 29

Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint

Neurocomputing
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Unsupervised learning of time-frequency patches as a noise-robust representation of speech

Speech Communication
Monaural musical sound separation based on pitch and common amplitude modulation

IEEE Transactions on Audio, Speech, and Language Processing
A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance

IEEE Transactions on Signal Processing
Adaptive harmonic spectral decomposition for multiple pitch estimation

IEEE Transactions on Audio, Speech, and Language Processing
Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription

IEEE Transactions on Audio, Speech, and Language Processing
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Source/filter model for unsupervised main melody extraction from polyphonic audio signals

IEEE Transactions on Audio, Speech, and Language Processing
Re-texturing the sonic environment

Proceedings of the 5th Audio Mostly Conference: A Conference on Interaction with Sound
Automatic recognition of lyrics in singing

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Nonnegative matrix factorization with Markov-Chained bases for modeling time-varying patterns in music spectrograms

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization

IEEE Transactions on Neural Networks
Correlation-based amplitude estimation of coincident partials in monaural musical signals

EURASIP Journal on Audio, Speech, and Music Processing
Single channel music sound separation based on spectrogram decomposition and note classification

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Pattern induction and matching in music signals

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Sparse nonnegative matrix factorization with ℓ0-constraints

Neurocomputing
Non-negative matrix factorization based noise reduction for noise robust automatic speech recognition

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Multiple instrument mixtures source separation evaluation using instrument-dependent NMF models

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
NMF-based environmental sound source separation using time-variant gain features

Computers & Mathematics with Applications
Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit

Journal of Signal Processing Systems
Reduction of non-stationary noise for a robotic living assistant using sparse non-negative matrix factorization

SMIAE '12 Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments
Perceptually enhanced blind single-channel music source separation by Non-negative Matrix Factorization

Digital Signal Processing
Modelling non-stationary noise with spectral factorisation in automatic speech recognition

Computer Speech and Language
Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation

Computer Speech and Language
Rapid speaker adaptation in latent speaker space with non-negative matrix factorization

Speech Communication
Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

An unsupervised learning algorithm for the separation of sound sources in one-channel music signals is presented. The algorithm is based on factorizing the magnitude spectrogram of an input signal into a sum of components, each of which has a fixed magnitude spectrum and a time-varying gain. Each sound source, in turn, is modeled as a sum of one or more components. The parameters of the components are estimated by minimizing the reconstruction error between the input spectrogram and the model, while restricting the component spectrograms to be nonnegative and favoring components whose gains are slowly varying and sparse. Temporal continuity is favored by using a cost term which is the sum of squared differences between the gains in adjacent frames, and sparseness is favored by penalizing nonzero gains. The proposed iterative estimation algorithm is initialized with random values, and the gains and the spectra are then alternatively updated using multiplicative update rules until the values converge. Simulation experiments were carried out using generated mixtures of pitched musical instrument samples and drum sounds. The performance of the proposed method was compared with independent subspace analysis and basic nonnegative matrix factorization, which are based on the same linear model. According to these simulations, the proposed method enables a better separation quality than the previous algorithms. Especially, the temporal continuity criterion improved the detection of pitched musical sounds. The sparseness criterion did not produce significant improvements