Clustering and matching headlines for automatic paraphrase acquisition

  • Authors:
  • Sander Wubben;Antal van den Bosch;Emiel Krahmer;Erwin Marsi

  • Affiliations:
  • Tilburg University, The Netherlands;Tilburg University, The Netherlands;Tilburg University, The Netherlands;Tilburg University, The Netherlands

  • Venue:
  • ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.