Automatic stochastic tagging of natural language texts
Computational Linguistics
Feature-rich part-of-speech tagging with a cyclic dependency network
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Part of speech tagging in context
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Acquisition of Morphology of an Indic Language from Text Corpus
ACM Transactions on Asian Language Information Processing (TALIP)
Analysis and evaluation of stemming algorithms: a case study with Assamese
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Hi-index | 0.00 |
Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics. For successful tagging, we examine relevant linguistic issues in Assamese. For unknown words, we perform simple morphological analysis to determine probable tags. Using a manually tagged corpus of about 10000 words for training, we obtain a tagging accuracy of nearly 87% for test inputs.