Elements of information theory
Elements of information theory
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
An integrated real-time beamforming and postfiltering system for nonstationary noise environments
EURASIP Journal on Applied Signal Processing
Single channel nonstationary stochastic signal separation using linear time-varying filters
IEEE Transactions on Signal Processing
Signal enhancement using beamforming and nonstationarity withapplications to speech
IEEE Transactions on Signal Processing
Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models
IEEE Transactions on Audio, Speech, and Language Processing
A Bayesian Approach for Blind Separation of Sparse Sources
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress.