Parser-based retraining for domain adaptation of probabilistic generators

Authors:
Deirdre Hogan;Jennifer Foster;Joachim Wagner;Josef van Genabith
Affiliations:
Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland;Dublin City University, Ireland
Venue:
INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Year:
2008

Citing 8
Cited 1

Forest-based statistical sentence generation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Bootstrapping statistical parsers from small datasets

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Long-distance dependency resolution in automatically acquired wide-coverage PCFG-based LFG approximations

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Robust PCFG-based generation using automatically acquired LFG approximations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training

IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Probabilistic models for disambiguation of an HPSG-based chart generator

Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology

Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

While the effect of domain variation on Penn-treebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.