Application of non-negative spectrogram decomposition with sparsity constraints to single-channel speech enhancement

Authors:
Kyogu Lee
Affiliations:
-
Venue:
Speech Communication
Year:
2014

Citing 10
Cited 0

What is the goal of sensory coding?

Neural Computation
Structure learning in conditional probability models via an entropic prior and parameter extinction

Neural Computation
Relation between PLSA and NMF and implications

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Online PLCA for real-time semi-supervised source separation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Real-Time speech separation by semi-supervised nonnegative matrix factorization

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an algorithm for single-channel speech enhancement that requires no pre-trained models - neither speech nor noise models - using non-negative spectrogram decomposition with sparsity constraints. To this end, before staring the EM algorithm for spectrogram decomposition, we divide the spectral basis vectors into two disjoint groups - speech and noise groups - and impose sparsity constraints only on those in the speech group as we update the parameters. After the EM algorithm converges, the proposed algorithm successfully separates speech from noise, and no post-processing is required for speech reconstruction. Experiments with various types of real-world noises show that the proposed algorithm achieves performance significantly better than other classical algorithms or comparable to the spectrogram decomposition method using pre-trained noise models.