Semi-supervised training for the averaged perceptron POS tagger

  • Authors:
  • Drahomíra "johanka" Spoustová;Jan Hajič;Jan Raab;Miroslav Spousta

  • Affiliations:
  • Charles University Prague, Czech Republic;Charles University Prague, Czech Republic;Charles University Prague, Czech Republic;Charles University Prague, Czech Republic

  • Venue:
  • EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes POS tagging experiments with semi-supervised training as an extension to the (supervised) averaged perceptron algorithm, first introduced for this task by (Collins, 2002). Experiments with an iterative training on standard-sized supervised (manually annotated) dataset (106 tokens) combined with a relatively modest (in the order of 108 tokens) unsupervised (plain) data in a bagging-like fashion showed significant improvement of the POS classification task on typologically different languages, yielding better than state-of-the-art results for English and Czech (4.12 % and 4.86 % relative error reduction, respectively; absolute accuracies being 97.44 % and 95.89 %).