Forest-based statistical sentence generation
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Bootstrapping statistical parsers from small datasets
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Robust PCFG-based generation using automatically acquired LFG approximations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Adapting WSJ-trained parsers to the British National Corpus using in-domain self-training
IWPT '07 Proceedings of the 10th International Conference on Parsing Technologies
Probabilistic models for disambiguation of an HPSG-based chart generator
Parsing '05 Proceedings of the Ninth International Workshop on Parsing Technology
Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation
MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation
Hi-index | 0.00 |
While the effect of domain variation on Penn-treebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.