Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario

  • Authors:
  • Sandipan Dandapat;Sudeshna Sarkar;Anupam Basu

  • Affiliations:
  • Indian Institute of Technology, Kharagpur, India;Indian Institute of Technology, Kharagpur, India;Indian Institute of Technology, Kharagpur, India

  • Venue:
  • ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. We find that the use of morphology helps improve the accuracy of the tagger especially when less amount of tagged corpora are available.