Psychoacoustically constrained and distortion minimized speech enhancement
IEEE Transactions on Audio, Speech, and Language Processing
EURASIP Journal on Advances in Signal Processing - Special issue on robust processing of nonstationary signals
Hi-index | 0.00 |
Although many discrete Fourier transform (DFT) domain-based speech enhancement methods rely on stochastic models to derive clean speech estimators, like the Gaussian and Laplace distribution, certain speech sounds clearly show a more deterministic character. In this paper, we study the use of a deterministic model in combination with the well-known stochastic models for speech enhancement. We derive a minimum mean-square error (MMSE) estimator under a combined stochastic-deterministic speech model with speech presence uncertainty and show that for different distributions of the DFT coefficients the combined stochastic-deterministic speech model leads to improved performance of approximately 0.8 dB segmental signal-to-noise ratio (SNR) over the use of a stochastic model alone. Evaluation with perceptual evaluation of speech quality (PESQ) shows performance improvements of approximately 0.15 on an MOS scale