Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario

Authors:
Sandipan Dandapat;Sudeshna Sarkar;Anupam Basu
Affiliations:
Indian Institute of Technology, Kharagpur, India;Indian Institute of Technology, Kharagpur, India;Indian Institute of Technology, Kharagpur, India
Venue:
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Year:
2007

Citing 4
Cited 2

Automatic stochastic tagging of natural language texts

Computational Linguistics
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Learning-based named entity recognition for morphologically-rich, resource-scarce languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
An Information-Extraction System for Urdu---A Resource-Poor Language

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. We find that the use of morphology helps improve the accuracy of the tagger especially when less amount of tagged corpora are available.