Collecting paraphrase corpora from volunteer contributors

  • Authors:
  • Timothy Chklovski

  • Affiliations:
  • University of Southern California, Marina del Rey, CA

  • Venue:
  • Proceedings of the 3rd international conference on Knowledge capture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extensive and deep paraphrase corpora are important for a variety of natural language processing and user interaction tasks. In this paper, we present an approach which i) collects multiple paraphrases per given item from volunteers and ii) incentivises responsible contributions by volunteer contributors. Our approach is to solicit paraphrases from Web volunteers, both collecting new paraphrases with no prompting and asking contributors to guess partially obfuscated paraphrases. To test the approach, we have implemented an online game, 1001 Paraphrases (http://ai-games.org/paraphrase.html), and deployed it to collect 20,944 entries focused on paraphrases of 400 statements. The approach complements existing text extraction methods and has some inherent unique advantages. We present and motivate our design as well as share preliminary observations and lessons learned about the performance of the approach.