Validation of sub-sentential paraphrases acquired from parallel monolingual corpora

  • Authors:
  • Houda Bouamor;Aurélien Max;Anne Vilnat

  • Affiliations:
  • LIMSI-CNRS & Univ. Paris Sud Orsay, France;LIMSI-CNRS & Univ. Paris Sud Orsay, France;LIMSI-CNRS & Univ. Paris Sud Orsay, France

  • Venue:
  • EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The task of paraphrase acquisition from related sentences can be tackled by a variety of techniques making use of various types of knowledge. In this work, we make the hypothesis that their performance can be increased if candidate paraphrases can be validated using information that characterizes paraphrases independently of the set of techniques that proposed them. We implement this as a bi-class classification problem (i.e. paraphrase vs. not paraphrase), allowing any paraphrase acquisition technique to be easily integrated into the combination system. We report experiments on two languages, English and French, with 5 individual techniques on parallel monolingual parallel corpora obtained via multiple translation, and a large set of classification features including surface to contextual similarity measures. Relative improvements in F-measure close to 18% are obtained on both languages over the best performing techniques.