Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models

Authors:
Israel Cohen
Affiliations:
Department of Electrical Engineering, Technion--Israel Institute of Technology, Technion City, Haifa, Israel
Venue:
Signal Processing
Year:
2006

Citing 4
Cited 6

Modeling speech signals in the time-frequency domain using GARCH

Signal Processing
A modular approach to speech enhancement with an application to speech coding

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement

EURASIP Journal on Applied Signal Processing
Hidden Markov processes

IEEE Transactions on Information Theory

Modeling speech signals in the time-frequency domain using GARCH

Signal Processing
Estimating multivariate ARCH parameters by two-stage least-squares method

Signal Processing
Simultaneous parameter estimation and state smoothing of complex GARCH process in the presence of additive noise

Signal Processing
Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss

Speech Communication
An efficient solution to improve the spectral noise suppression rules

Digital Signal Processing
Two dimensional noncausal AR-ARCH model: Stationary conditions, parameter estimation and its application to anomaly detection

Signal Processing

Quantified Score

Hi-index	0.08

Visualization

Abstract

In this paper, we develop and evaluate speech enhancement algorithms, which are based on supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models in the short-time Fourier transform (STFT) domain. We consider three different statistical models, two fidelity criteria, and two approaches for the estimation of the variances of the STFT coefficients. The statistical model is either Gaussian, Gamma or Laplacian; the fidelity criteria include minimum mean-squared error (MMSE) of the STFT coefficients and MMSE of the log-spectral amplitude (LSA); the spectral variance is estimated based on either the proposed GARCH models or the decision-directed method of Ephraim and Malah. We show that estimating the variance by the GARCH modeling method yields lower log-spectral distortion and higher perceptual evaluation of speech quality scores (PESQ, ITU-T P.862) than by using the decision-directed method, whether the presumed statistical model is Gaussian, Gamma or Laplacian, and whether the fidelity criterion is MMSE of the STFT coefficients or MMSE of the LSA, furthermore while a gaussian model is inferior to the supergaussian models when USING the decision-directed method, the Gaussian model is superior when using the garch modeling method.