Acquiring paraphrases from text corpora

  • Authors:
  • Rahul Bhagat;Eduard Hovy;Siddharth Patwardhan

  • Affiliations:
  • USC Information Sciences Institute, Marina del Rey, CA, USA;USC Information Sciences Institute, Marina del Rey, CA, USA;University of Utah, Salt Lake City, UT, USA

  • Venue:
  • Proceedings of the fifth international conference on Knowledge capture
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Paraphrases are textual expressions that convey the same meaning using different surface forms. Capturing the variability of language, they play an important role in many natural language applications includ ing question answering, machine translation, and multi-document summarization. In linguistics, paraphrases are characterized by approximate conceptual equivalence. Since no automated semantic interpretation systems available today can identify conceptual equivalence, paraphrases are difficult to acquire without human effort. In this paper, we present a method for automatically acquiring paraphrases using a monolingual corpus. We learn paraphrases at both the surface and lexico-syntactic levels and build two paraphrase resources each containing about 2 million phrases. We evaluate these paraphrases extrinsically by using them to learn patterns for Information Extraction (IE). We show that the lexico-syntactic paraphrases performs better than the surface-level paraphrases for IE. We further show that the patterns learned using the lexico-syntactic paraphrases attain comparable performance to the traditional IE approach of learning patterns from domain-specific corpora.