Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE--STSA estimation in various noise environments

Authors:
Hacı Taşmaz;Ergun Erçelebi
Affiliations:
Vocational High School, University of Gaziantep, 27310 Gaziantep, Turkey;Department of Electrical and Electronics Engineering, University of Gaziantep, 27310 Gaziantep, Turkey
Venue:
Digital Signal Processing
Year:
2008

Citing 2
Cited 1

Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02

Wavelet based speech presence probability estimator for speech enhancement

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we proposed a new speech enhancement system, which integrates a perceptual filterbank and minimum mean square error-short time spectral amplitude (MMSE-STSA) estimation, modified according to speech presence uncertainty. The perceptual filterbank was designed by adjusting undecimated wavelet packet decomposition (UWPD) tree, according to critical bands of psycho-acoustic model of human auditory system. The MMSE-STSA estimation (modified according to speech presence uncertainty) was used for estimation of speech in undecimated wavelet packet domain. The perceptual filterbank provides a good auditory representation (sufficient frequency resolution), good perceptual quality of speech and low computational load. The MMSE-STSA estimator is based on a priori SNR estimation. A priori SNR estimation, which is a key parameter in MMSE-STSA estimator, was performed by using ''decision directed method.'' The ''decision directed method'' provides a trade off between noise reduction and signal distortion when correctly tuned. The experiments were conducted for various noise types. The results of proposed method were compared with those of other popular methods, Wiener estimation and MMSE-log spectral amplitude (MMSE-LSA) estimation in frequency domain. To test the performance of the proposed speech enhancement system, three objective quality measurement tests (SNR, segSNR and Itakura-Saito distance (ISd)) were conducted for various noise types and SNRs. Experimental results and objective quality measurement test results proved the performance of proposed speech enhancement system. The proposed speech enhancement system provided sufficient noise reduction and good intelligibility and perceptual quality, without causing considerable signal distortion and musical background noise.