Verbs are where all the action lies: experiences of shallow parsing of a morphologically rich language

Authors:
Harshada Gune;Mugdha Bapat;Mitesh M. Khapra;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology Bombay;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 6
Cited 2

Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
TnT: a statistical part-of-speech tagger

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Decision tree models applied to the labeling of text with parts-of-speech

HLT '91 Proceedings of the workshop on Speech and Natural Language
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Stepwise mining of multi-word expressions in Hindi

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Turkish constituent chunking with morphological and contextual features

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich language- Marathi. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier. Accuracy figures of 94% for Part of Speech Tagging and 97% for Chunking using a modestly sized corpus (20K words) vindicate our claim that for morphologically rich languages linguistic insight can obviate the need for large amount of annotated corpora.