TRUES: Tone Recognition Using Extended Segments

  • Authors:
  • Jiang-Chun Chen;Jyh-Shing Roger Jang

  • Affiliations:
  • National Tsing Hua University, Taiwan;National Tsing Hua University, Taiwan

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tone recognition has been a basic but important task for speechrecognition and assessment of tonal languages, such as MandarinChinese. Most previously proposed approaches adopt a two-stepapproach where syllables within an utterance are identified viaforced alignment first, and tone recognition using a variety ofclassifiers---such as neural networks, Gaussian mixture models(GMM), hidden Markov models (HMM), support vector machines(SVM)---is then performed on each segmented syllable to predict itstone. However, forced alignment does not always generate accuratesyllable boundaries, leading to unstable voiced-unvoiced detectionand deteriorating performance in tone recognition. Aiming toalleviate this problem, we propose a robust approach called ToneRecognition Using Extended Segments (TRUES) for HMM-basedcontinuous tone recognition. The proposed approach extracts anunbroken pitch contour from a given utterance based on dynamicprogramming over time-domain acoustic features of average magnitudedifference function (AMDF). The pitch contour of each syllable isthen extended for tri-tone HMM modeling, such that the influencefrom inaccurate syllable boundaries is lessened. Our experimentalresults demonstrate that the proposed TRUES achieves 49.13%relative error rate reduction over that of the recently proposedsupratone modeling, which is deemed the state of the art of tonerecognition that outperforms several previously proposedapproaches. The encouraging improvement demonstrates theeffectiveness and robustness of the proposed TRUES, as well as thecorresponding pitch determination algorithm which produces unbrokenpitch contours.