Speech enhancement using Gaussian scale mixture models

Authors:
Jiucang Hao;Te-Won Lee;Terrence J. Sejnowski
Affiliations:
Computational Neurobiology Laboratory, Salk Institute, La Jolla, CA and Institute for Neural Computation, University of California, San Diego, CA;Qualcomm, Inc., San Diego, CA;Howard Hughes Medical Institute and Computational Neurobiology Laboratory, Salk Institute, La Jolla, CA and Division of Biological Sciences, University of California, San Diego, CA
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 8
Cited 0

Elements of information theory

Elements of information theory
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
An integrated real-time beamforming and postfiltering system for nonstationary noise environments

EURASIP Journal on Applied Signal Processing
Single channel nonstationary stochastic signal separation using linear time-varying filters

IEEE Transactions on Signal Processing
Signal enhancement using beamforming and nonstationarity withapplications to speech

IEEE Transactions on Signal Processing
Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models

IEEE Transactions on Audio, Speech, and Language Processing
A Bayesian Approach for Blind Separation of Sparse Sources

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.