Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Tagging and morphological disambiguation of Turkish text
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Decision tree models applied to the labeling of text with parts-of-speech
HLT '91 Proceedings of the workshop on Speech and Natural Language
Morphological richness offsets resource demand- experiences in constructing a POS tagger for Hindi
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Stepwise mining of multi-word expressions in Hindi
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Turkish constituent chunking with morphological and contextual features
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Hi-index | 0.00 |
Verb suffixes and verb complexes of morphologically rich languages carry a lot of information. We show that this information if harnessed for the task of shallow parsing can lead to dramatic improvements in accuracy for a morphologically rich language- Marathi. The crux of the approach is to use a powerful morphological analyzer backed by a high coverage lexicon to generate rich features for a CRF based sequence classifier. Accuracy figures of 94% for Part of Speech Tagging and 97% for Chunking using a modestly sized corpus (20K words) vindicate our claim that for morphologically rich languages linguistic insight can obviate the need for large amount of annotated corpora.