Semi-supervised training of a statistical parser from unlabeled partially-bracketed data

  • Authors:
  • Rebecca Watson;Ted Briscoe;John Carroll

  • Affiliations:
  • University of Cambridge, UK;University of Cambridge, UK;University of Sussex, UK

  • Venue:
  • IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We compare the accuracy of a statistical parse ranking model trained from a fully-annotated portion of the Susanne treebank with one trained from unlabeled partially-bracketed sentences derived from this treebank and from the Penn Treebank. We demonstrate that confidence-based semi-supervised techniques similar to self-training outperform expectation maximization when both are constrained by partial bracketing. Both methods based on partially-bracketed training data outperform the fully supervised technique, and both can, in principle, be applied to any statistical parser whose output is consistent with such partial-bracketing. We also explore tuning the model to a different domain and the effect of in-domain data in the semi-supervised training processes.