Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine Learning
Tagging and morphological disambiguation of Turkish text
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger
ANLC '92 Proceedings of the third conference on Applied natural language processing
Comparing a linguistic and a stochastic tagger
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Part-of-speech tagging with neural networks
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Decision tree models applied to the labeling of text with parts-of-speech
HLT '91 Proceedings of the workshop on Speech and Natural Language
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Learning-based named entity recognition for morphologically-rich, resource-scarce languages
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Part of speech tagger for Assamese text
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
SVM based Manipuri POS tagging using SVM based identified reduplicated MWE (RMWE)
Proceedings of the CUBE International Information Technology Conference
Hi-index | 0.00 |
In this paper we report our work on building a POS tagger for a morphologically rich language- Hindi. The theme of the research is to vindicate the stand that- if morphology is strong and harnessable, then lack of training corpora is not debilitating. We establish a methodology of POS tagging which the resource disadvantaged (lacking annotated corpora) languages can make use of. The methodology makes use of locally annotated modestly-sized corpora (15,562 words), exhaustive morpohological analysis backed by high-coverage lexicon and a decision tree based learning algorithm (CN2). The evaluation of the system was done with 4-fold cross validation of the corpora in the news domain (www.bbc.co.uk/hindi). The current accuracy of POS tagging is 93.45% and can be further improved.