Comparison of performance of enhanced morpheme-based language model with different word-based language models for improving the performance of Tamil speech recognition system

  • Authors:
  • S. Saraswathi;T. V. Geetha

  • Affiliations:
  • Pondicherry Engineering College, Puducherry, India;College of Engineering, Anna University, Chennai, India

  • Venue:
  • ACM Transactions on Asian Language Information Processing (TALIP)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a new technique of language modeling for a highly inflectional Dravidian language, Tamil. It aims to alleviate the main problems encountered in processing of Tamil language, like enormous vocabulary growth caused by the large number of different forms derived from one word. The size of the vocabulary was reduced by, decomposing the words into stems and endings and storing these sub word units (morphemes) in the vocabulary separately. A enhanced morpheme-based language model was designed for the inflectional language Tamil. The enhanced morpheme-based language model was trained on the decomposed corpus. The perplexity and Word Error Rate (WER) were obtained to check the efficiency of the model for Tamil speech recognition system. The results were compared with word-based bigram and trigram language models, distance based language model, dependency based language model and class based language model. From the results it was analyzed that the enhanced morpheme-based trigram model with Katz back-off smoothing effect improved the performance of the Tamil speech recognition system when compared to the word-based language models.