Joint time-frequency segmentation algorithm for transient speech decomposition and speech enhancement

Authors:
Charturong Tantibundhit;Franz Pernkopf;Gernot Kubin
Affiliations:
Department of Electrical and Computer Engineering, Thammasat University, Pathumthani, Thailand;Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz, Austria;Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz, Austria
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 8
Cited 1

Elements of information theory

Elements of information theory
The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise

Speech Communication
Hybrid representations for audiophonic signal encoding

Signal Processing - Image and Video Coding beyond Standards
New signal decomposition method based speech enhancement

Signal Processing
Speech enhancement based on joint time-frequency segmentation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Wavelet-based statistical signal processing using hidden Markovmodels

IEEE Transactions on Signal Processing
Computational methods for hidden Markov tree models-an application to wavelet trees

IEEE Transactions on Signal Processing
Joint space-frequency segmentation using balanced wavelet packet trees for least-cost image representation

IEEE Transactions on Image Processing

A generalized synchrosqueezing transform for enhancing signal time-frequency representation

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We develop an algorithm, the joint time-frequency segmentation algorithm, where the wavelet packet coefficients of the analyzed speech signal are represented as tiles of a time-frequency representation adapted to the characteristics of the signal itself. Further, our algorithm enables the decomposition of the speech signal into transient and non-transient components, respectively. Any block of wavelet packet coefficients, whose tiling height is larger than or equal to the tiling width belongs to the transient component and vice versa for the non-transient component. The transient component is selectively amplified and recombined with the original speech to generate the modified speech with energy adjusted to be equal to the original speech. The intelligibility of the original and modified speech is evaluated by 16 human listeners. Word recognition rate results show that the modified speech significantly improves speech intelligibility in background noise, i.e., by 10% absolute at 0 dB to 27% absolute at - 30 dB.