Bootstrapping morphological analyzers by combining human elicitation and machine learning

  • Authors:
  • Kemal Oflazer;Sergei Nirenburg;Marjorie McShane

  • Affiliations:
  • Sabanci University;New Mexico State University;New Mexico State University

  • Venue:
  • Computational Linguistics
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a semiautomatic technique for developing broad-coverage finite-state mor-phological analyzers for use in natural language processing applications. It consists of three components---elicitation of linguistic information from humans, a machine learning bootstrapping scheme, and a testing environment. The three components are applied iteratively until a threshold of output quality is attained. The initial application of this technique is for the morphology of low-density languages in the context of the Expedition project at NMSU Computing Research Laboratory. This elicit-build-test technique compiles lexical and inflectional information elicited from a human into a finite-state transducer lexicon and combines this with a sequence of morphographemic rewrite rules that is induced using transformation-based learning from the elicited examples. The resulting morphological analyzer is then tested against a test set, and any corrections are fed back into the learning procedure, which then builds an improved analyzer.