Large-coverage root lexicon extraction for Hindi

  • Authors:
  • Cohan Sujay Carlos;Monojit Choudhury;Sandipan Dandapat

  • Affiliations:
  • Microsoft Research India;Microsoft Research India;Microsoft Research India

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a method using morphological rules and heuristics, for the automatic extraction of large-coverage lexicons of stems and root word-forms from a raw text corpus. We cast the problem of high-coverage lexicon extraction as one of stemming followed by root word-form selection. We examine the use of POS tagging to improve precision and recall of stemming and thereby the coverage of the lexicon. We present accuracy, precision and recall scores for the system on a Hindi corpus.