Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

  • Authors:
  • Jennifer Foster;Joachim Wagner;Djamé Seddah;Josef van Genabith

  • Affiliations:
  • Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland

  • Venue:
  • IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson's reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees (produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.