Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

Authors:
S. Saraswathi;T. V. Geetha
Affiliations:
Pondicherry Engineering College, Puducherry, India;College of Engineering, Anna University, Chennai, India
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2007

Citing 8
Cited 1

Building Language Models for Continuous Speech Recognition Systems

PorTAL '02 Proceedings of the Third International Conference on Advances in Natural Language Processing
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Probabilistic top-down parsing and language modeling

Computational Linguistics
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Chinese named entity identification using class-based language model

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A study on richer syntactic dependencies for structured language modeling

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of dependency structure for language modeling

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6

Integration of multiple acoustic and language models for improved Hindi speech recognition system

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number of different forms derived from one word. The size of the vocabulary was reduced by, decomposing the words into stems and endings and storing these sub word units (morphemes) in the vocabulary separately. A enhanced morpheme-based language model was designed for the inflectional language Tamil. The enhanced morpheme-based language model was trained on the decomposed corpus. The perplexity and Word Error Rate (WER) were obtained to check the efficiency of the model for Tamil speech recognition system. The results were compared with word-based bigram and trigram language models, distance based language model, dependency based language model and class based language model. From the results it was analyzed that the enhanced morpheme-based trigram model with Katz back-off smoothing effect improved the performance of the Tamil speech recognition system when compared to the word-based language models.