Automatic lexical acquisition from raw corpora: an application to Russian

  • Authors:
  • Antoni Oliver;Irene Castellón;Lluís Màrquez

  • Affiliations:
  • U. Oberta de Catalunya;GRIAL Group, Lingüística General - UB;TALP Research Center, LSI, UPC

  • Venue:
  • MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a methodology for the automatic acquisition of lexical and morpho-syntactic information from raw corpora. The system uses information about the inflectional morphology declared by rules and is based on the co-occurrence of different forms of the same paradigm in the corpus. A direct application of this methodology gives very poor precision rates due to rule interaction between paradigms. We present a rule analysis algorithm that solves this problem, giving quite better precision rates, although recall decreases dramatically. Finally, we investigate some techniques to raise the recall, achieving recall rates around 67% with a precision of 92%.