Statistical methods for speech recognition
Statistical methods for speech recognition
The Cambridge University Multimedia Document Retrieval demo system (demonstration session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Spelling correction in agglutinative languages
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Statistical morphological disambiguation for agglutinative languages
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An Introduction to Language
Morph-based speech recognition and modeling of out-of-vocabulary words across languages
ACM Transactions on Speech and Language Processing (TSLP)
Morpheme-Based Automatic Speech Recognition of Basque
IbPRIA '09 Proceedings of the 4th Iberian Conference on Pattern Recognition and Image Analysis
Automatic speech recognition for under-resourced languages: A survey
Speech Communication
Hi-index | 0.00 |
We have designed a Turkish dictation system for newspaper content transcription application. Turkish is an agglutinative language with free word order. These characteristics of the language result in vocabulary explosion, large number of out-of-vocabulary (OOV) words and an increased complexity of n-gram language models in speech recognition when words are used as recognition units. In this paper, alternative language modeling units like "stems and endings", "stems and morphemes", and "syllables" are investigated instead of "words". These recognition units are compared in terms of vocabulary size, coverage, bigram perplexity and speech recognition performance. A combined model is proposed which aims to produce a balance between the OOV rate and the amount of phoneme sequence constraints on recognition units. The proposed model resulted in letter error rates (LER's) of approximately 28% for a speaker independent system and 20% for a speaker dependent system. These error rates are smaller compared to the traditional word-based model for newspaper content transcription application.