Building Language Models for Continuous Speech Recognition Systems
PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Probabilistic top-down parsing and language modeling
Computational Linguistics
A new statistical parser based on bigram lexical dependencies
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Chinese named entity identification using class-based language model
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A study on richer syntactic dependencies for structured language modeling
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of dependency structure for language modeling
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised discovery of morphemes
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Integration of multiple acoustic and language models for improved Hindi speech recognition system
International Journal of Speech Technology
Hi-index | 0.00 |
This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number of different forms derived from one word. The size of the vocabulary was reduced by, decomposing the words into stems and endings and storing these sub word units (morphemes) in the vocabulary separately. A enhanced morpheme-based language model was designed for the inflectional language Tamil. The enhanced morpheme-based language model was trained on the decomposed corpus. The perplexity and Word Error Rate (WER) were obtained to check the efficiency of the model for Tamil speech recognition system. The results were compared with word-based bigram and trigram language models, distance based language model, dependency based language model and class based language model. From the results it was analyzed that the enhanced morpheme-based trigram model with Katz back-off smoothing effect improved the performance of the Tamil speech recognition system when compared to the word-based language models.