Non-negative hidden Markov modeling of audio with application to source separation

Authors:
Gautham J. Mysore;Paris Smaragdis;Bhiksha Raj
Affiliations:
Center for Computer Research in Music and Acoustics, Stanford University;Advanced Technology Labs, Adobe Systems Inc.;School of Computer Science, Carnegie Mellon University
Venue:
LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Year:
2010

Citing 3
Cited 3

Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Audio source separation with a single sensor

IEEE Transactions on Audio, Speech, and Language Processing

Spatial efficiency of blind source separation based on decorrelation - subjective and objective assessment

Speech Communication
A non-negative approach to language informed speech separation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Segregating event streams and noise with a Markov renewal process model

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, there has been a great deal of work in modeling audio using non-negative matrix factorization and its probabilistic counterparts as they yield rich models that are very useful for source separation and automatic music transcription. Given a sound source, these algorithms learn a dictionary of spectral vectors to best explain it. This dictionary is however learned in a manner that disregards a very important aspect of sound, its temporal structure. We propose a novel algorithm, the non-negative hidden Markov model (N-HMM), that extends the aforementioned models by jointly learning several small spectral dictionaries as well as a Markov chain that describes the structure of changes between these dictionaries. We also extend this algorithm to the non-negative factorial hidden Markov model (N-FHMM) to model sound mixtures, and demonstrate that it yields superior performance in single channel source separation tasks.