Automatic stochastic tagging of natural language texts
Computational Linguistics
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Learning-based named entity recognition for morphologically-rich, resource-scarce languages
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
An Information-Extraction System for Urdu---A Resource-Poor Language
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. We find that the use of morphology helps improve the accuracy of the tagger especially when less amount of tagged corpora are available.