Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi

Authors:
Smriti Singh;Kuhoo Gupta;Manish Shrivastava;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology, Mumbai, Maharashtra, India;Indian Institute of Technology, Mumbai, Maharashtra, India;Indian Institute of Technology, Mumbai, Maharashtra, India;Indian Institute of Technology, Mumbai, Maharashtra, India
Venue:
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Year:
2006

Citing 8
Cited 7

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
The CN2 Induction Algorithm

Machine Learning
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Comparing a linguistic and a stochastic tagger

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Part-of-speech tagging with neural networks

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Decision tree models applied to the labeling of text with parts-of-speech

HLT '91 Proceedings of the workshop on Speech and Natural Language

Automatic part-of-speech tagging for Bengali: an approach for morphologically rich languages in a poor resource scenario

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Learning-based named entity recognition for morphologically-rich, resource-scarce languages

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Part of speech tagger for Assamese text

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Verbs are where all the action lies: experiences of shallow parsing of a morphologically rich language

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Will the identification of reduplicated multiword expression (RMWE) improve the performance of SVM based manipuri POS tagging?

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
SVM based Manipuri POS tagging using SVM based identified reduplicated MWE (RMWE)

Proceedings of the CUBE International Information Technology Conference
Full Length Article: Simulated annealing based classifier ensemble techniques: Application to part of speech tagging

Information Fusion

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we report our work on building a POS tagger for a morphologically rich language- Hindi. The theme of the research is to vindicate the stand that- if morphology is strong and harnessable, then lack of training corpora is not debilitating. We establish a methodology of POS tagging which the resource disadvantaged (lacking annotated corpora) languages can make use of. The methodology makes use of locally annotated modestly-sized corpora (15,562 words), exhaustive morpohological analysis backed by high-coverage lexicon and a decision tree based learning algorithm (CN2). The evaluation of the system was done with 4-fold cross validation of the corpora in the news domain (www.bbc.co.uk/hindi). The current accuracy of POS tagging is 93.45% and can be further improved.