Limited error based event localizing temporal decomposition and its application to variable-rate speech coding

Authors:
Phu Chien Nguyen;Masato Akagi;Binh Phu Nguyen
Affiliations:
Graduate School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan;Graduate School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan;Graduate School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
Venue:
Speech Communication
Year:
2007

Citing 2
Cited 2

Computer speech processing

Computer speech processing
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication

Distortion of voicing and vocal tract parameters after codecs

CSECS'09 Proceedings of the 8th WSEAS International Conference on Circuits, systems, electronics, control & signal processing
Utilizing intelligent segmentation in isolated word recognition using a hybrid HTD-HMM

CISST '11 Proceedings of the 5th WSEAS international conference on Circuits, systems, signal and telecommunications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel algorithm for temporal decomposition (TD) of speech, called 'limited error based event localizing temporal decomposition' (LEBEL-TD), and its application to variable-rate speech coding. In previous work with TD, TD analysis was usually performed on each speech segment of about 200-300ms or more, making it impractical for online applications. In this present work, the event localization is determined based on a limited error criterion and a local optimization strategy, which results in an average algorithmic delay of 65ms. Simulation results show that an average log spectral distortion of about 1.5dB can be achievable at an event rate of 20events/s. Also, LEBEL-TD uses neither the computationally costly singular value decomposition routine nor the event refinement process, thus reducing significantly the computational cost of TD. Further, a method for variable-rate speech coding an average rate of around 1.8kbps based on STRAIGHT (Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum), which is a high-quality speech analysis-synthesis framework, using LEBEL-TD is also realized. Subjective test results indicate that the performance of the proposed speech coding method is comparable to that of the 4.8kbps FS-1016 CELP coder.