Avoiding repetition in generated text

  • Authors:
  • Mary Ellen Foster;Michael White

  • Affiliations:
  • Technische Universität München, Garching, Germany;The Ohio State University, Columbus, OH

  • Venue:
  • ENLG '07 Proceedings of the Eleventh European Workshop on Natural Language Generation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate two methods for enhancing variation in the output of a stochastic surface realiser: choosing from among the highest-scoring realisation candidates instead of taking the single highest-scoring result (ε-best sampling), and penalising the words from earlier sentences in a discourse when generating later ones (anti-repetition scoring). In a human evaluation study, subjects were asked to compare texts generated with and without the variation enhancements. Strikingly, subjects judged the texts generated using these two methods to be better written and less repetitive than the texts generated with optimal n-gram scoring; at the same time, no significant difference in understandability was found between the two versions. In analysing the two methods, we show that the simpler ε-best sampling method is considerably more prone to introducing dispreferred variants into the output, indicating that best results can be obtained using anti repetition scoring with strict or no ε-best sampling.