Towards increasing speech recognition error rates
Speech Communication
Statistical methods for speech recognition
Statistical methods for speech recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Speech and Language Processing (2nd Edition)
Speech and Language Processing (2nd Edition)
Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas
WSEAS Transactions on Signal Processing
MLN-based Bangla ASR using context sensitive triphone HMM
International Journal of Speech Technology
Continuous Speech Recognition system for Tamil language using monophone-based Hidden Markov Model
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Hi-index | 0.00 |
Building a continuous speech recognizer for the Indian language like Tamil is a challenging task due to the unique inherent features of the language like long and short vowels, lack of aspirated stops, aspirated consonants and many instances of allophones. Stress and accent vary in spoken Tamil language from region to region. But in formal read Tamil speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Like other Indian languages, Tamil is also syllabic in nature. Pronunciation of words and sentences is strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Tamil for small and restricted tasks. However medium and large vocabulary CSR for Tamil is relatively new and not explored. In this paper, the authors have attempted to build a Hidden Markov Model (HMM) based word and triphone acoustic models. The objective of this research is to build a small vocabulary word based and a medium vocabulary triphone based continuous speech recognizers for Tamil language. In this experimentation, a word based Context Independent (CI) acoustic model for 371 unique words and a triphone based Context Dependent (CD) acoustic model for 1700 unique words have been built. In addition to the acoustic models a pronunciation dictionary with 44 base phones and trigram based statistical language model have also been built as integral components of the linguist. These recognizers give very good word accuracy for trained and test sentences read by trained and new speakers.