Large-coverage root lexicon extraction for Hindi

Authors:
Cohan Sujay Carlos;Monojit Choudhury;Sandipan Dandapat
Affiliations:
Microsoft Research India;Microsoft Research India;Microsoft Research India
Venue:
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2009

Citing 5
Cited 0

Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
An unsupervised Hindi stemmer with heuristic improvements

Proceedings of the second workshop on Analytics for noisy unstructured text data
Induction of a stem lexicon for two-level morphological analysis

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Morphological lexicon extraction from raw text data

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Automatic acquisition of a slovak lexicon from a raw corpus

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage of the lexicon. We present accuracy, precision and recall scores for the system on a Hindi corpus.