TRUES: Tone Recognition Using Extended Segments

Authors:
Jiang-Chun Chen;Jyh-Shing Roger Jang
Affiliations:
National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2008

Citing 3
Cited 2

Using tone information in Cantonese continuous speech recognition

ACM Transactions on Asian Language Information Processing (TALIP)
Improved mandarin speech recognition by lattice rescoring with enhanced tone models

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Automatic detection of tone mispronunciation in mandarin

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing

The impact of accents on automatic recognition of South African English speech: a preliminary investigation

SAICSIT '10 Proceedings of the 2010 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
Support of Android lab modules for embedded system curriculum

WESE '10 Proceedings of the 2010 Workshop on Embedded Systems Education

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tone recognition has been a basic but important task for speechrecognition and assessment of tonal languages, such as MandarinChinese. Most previously proposed approaches adopt a two-stepapproach where syllables within an utterance are identified viaforced alignment first, and tone recognition using a variety ofclassifiers---such as neural networks, Gaussian mixture models(GMM), hidden Markov models (HMM), support vector machines(SVM)---is then performed on each segmented syllable to predict itstone. However, forced alignment does not always generate accuratesyllable boundaries, leading to unstable voiced-unvoiced detectionand deteriorating performance in tone recognition. Aiming toalleviate this problem, we propose a robust approach called ToneRecognition Using Extended Segments (TRUES) for HMM-basedcontinuous tone recognition. The proposed approach extracts anunbroken pitch contour from a given utterance based on dynamicprogramming over time-domain acoustic features of average magnitudedifference function (AMDF). The pitch contour of each syllable isthen extended for tri-tone HMM modeling, such that the influencefrom inaccurate syllable boundaries is lessened. Our experimentalresults demonstrate that the proposed TRUES achieves 49.13%relative error rate reduction over that of the recently proposedsupratone modeling, which is deemed the state of the art of tonerecognition that outperforms several previously proposedapproaches. The encouraging improvement demonstrates theeffectiveness and robustness of the proposed TRUES, as well as thecorresponding pitch determination algorithm which produces unbrokenpitch contours.