Avoiding repetition in generated text

Authors:
Mary Ellen Foster;Michael White
Affiliations:
Technische Universität München, Garching, Germany;The Ohio State University, Columbus, OH
Venue:
ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
Year:
2007

Citing 17
Cited 2

The syntactic process

The syntactic process
Ambiguity management in natural language generation

Ambiguity management in natural language generation
Forest-based statistical sentence generation

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Generation that exploits corpus-based statistical knowledge

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Chart generation

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Exploiting a probabilistic hierarchical model for generation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Speaking with hands: creating animated conversational characters from recordings of human performance

ACM SIGGRAPH 2004 Papers
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Real versus Template-Based Natural Language Generation: A False Opposition?

Computational Linguistics
Techniques for text planning with XSLT

NLPXML '04 Proceeedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology
Designing an extensible API for integrating language modeling and realization

Software '05 Proceedings of the Workshop on Software
CCG chart realization from disjunctive inputs

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Individuality and alignment in generated dialogues

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Evaluating evaluation methods for generation in the presence of variation

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
High efficiency realization for a wide-coverage unification grammar

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Automated metrics that agree with human judgements on generated output for an embodied conversational agent

INLG '08 Proceedings of the Fifth International Natural Language Generation Conference
Creating disjunctive logical forms from aligned sentences for grammar-based paraphrase generation

MTTG '11 Proceedings of the Workshop on Monolingual Text-To-Text Generation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate two methods for enhancing variation in the output of a stochastic surface realiser: choosing from among the highest-scoring realisation candidates instead of taking the single highest-scoring result (ε-best sampling), and penalising the words from earlier sentences in a discourse when generating later ones (anti-repetition scoring). In a human evaluation study, subjects were asked to compare texts generated with and without the variation enhancements. Strikingly, subjects judged the texts generated using these two methods to be better written and less repetitive than the texts generated with optimal n-gram scoring; at the same time, no significant difference in understandability was found between the two versions. In analysing the two methods, we show that the simpler ε-best sampling method is considerably more prone to introducing dispreferred variants into the output, indicating that best results can be obtained using anti repetition scoring with strict or no ε-best sampling.