Parser-based retraining for domain adaptation of probabilistic generators

  • Authors:
  • Deirdre Hogan;Jennifer Foster;Joachim Wagner;Josef van Genabith

  • Affiliations:
  • Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland

  • Venue:
  • INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

While the effect of domain variation on Penn-treebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.