An HMM-based method for Thai spelling speech recognition

Authors:
C. Pisarn;T. Theeramunkong
Affiliations:
Sirindhorn International Institute of Technology, 131 Moo 5 Tiwanont Road, Bangkadi, Muang, Pathumthani 12000, Thailand;Sirindhorn International Institute of Technology, 131 Moo 5 Tiwanont Road, Bangkadi, Muang, Pathumthani 12000, Thailand
Venue:
Computers & Mathematics with Applications
Year:
2007

Citing 5
Cited 1

Information Retrieval

Information Retrieval
Spanish recognizer of continuously spelled names over the telephone

Speech Communication
Multiple approaches to robust speech recognition

HLT '91 Proceedings of the workshop on Speech and Natural Language
Improved spelling recognition using a tree-based fast lexical match

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

HCRF-UBM approach for text-independent speaker identification

Computers & Mathematics with Applications

Quantified Score

Hi-index	0.09

Visualization

Abstract

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents an approach to construct three recognizers for the three commonly-used Thai spelling methods based on hidden Markov models (HMMs). The Thai phonetic characteristics, alphabet system and spelling methods are analyzed. For the first spelling method, two recognizers, each trained from a small spelling corpus and an existing large continuous speech corpus, are explored. To solve utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Two alternative language models, bigram and trigram, are investigated to evaluate the performance of spelling speech recognition under three different environments: close-type, open-type and mix-type language models. For the first spelling method, our approach achieves up to 93.09% letter correct rate (LCR) and 92.45% letter accuracy (LA) when the language model is trigram under the mix-type environment and the acoustic model is trained from the small spelling corpus. Under the same conditions, we obtained 81.12% LCR and 76.32% LA for the second spelling method and 78.47% LCR and 71.75% LA for the third spelling method. By analyzing the results, it was found that the main source of the errors was letter substitution, which is mostly triggered by the confusion of similar consonant phones and the confusion of short/long vowel pairs.