Bootstrapping statistical parsers from small datasets

  • Authors:
  • Mark Steedman;Miles Osborne;Anoop Sarkar;Stephen Clark;Rebecca Hwa;Julia Hockenmaier;Paul Ruhlen;Steven Baker;Jeremiah Crim

  • Affiliations:
  • University of Edinburgh;University of Edinburgh;Simon Fraser University;University of Edinburgh;University of Maryland;University of Edinburgh;Johns Hopkins University;Cornell University;Johns Hopkins University

  • Venue:
  • EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of boot-strapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that boot-strapping continues to be useful, even though no manually produced parses from the target domain are used.