Evaluating automation strategies in language documentation

  • Authors:
  • Alexis Palmer;Taesun Moon;Jason Baldridge

  • Affiliations:
  • The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX;The University of Texas at Austin, Austin, TX

  • Venue:
  • HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents pilot work integrating machine labeling and active learning with human annotation of data for the language documentation task of creating interlinearized gloss text (IGT) for the Mayan language Uspanteko. The practical goal is to produce a totally annotated corpus that is as accurate as possible given limited time for manual annotation. We describe ongoing pilot studies which examine the influence of three main factors on reducing the time spent to annotate IGT: suggestions from a machine labeler, sample selection methods, and annotator expertise.