Re-examining machine translation metrics for paraphrase identification

  • Authors:
  • Nitin Madnani;Joel Tetreault;Martin Chodorow

  • Affiliations:
  • Educational Testing Service Princeton, NJ;Educational Testing Service Princeton, NJ;Hunter College of CUNY, New York, NY

  • Venue:
  • NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose to re-examine the hypothesis that automated metrics developed for MT evaluation can prove useful for paraphrase identification in light of the significant work on the development of new MT metrics over the last 4 years. We show that a meta-classifier trained using nothing but recent MT metrics outperforms all previous paraphrase identification approaches on the Microsoft Research Paraphrase corpus. In addition, we apply our system to a second corpus developed for the task of plagiarism detection and obtain extremely positive results. Finally, we conduct extensive error analysis and uncover the top systematic sources of error for a paraphrase identification approach relying solely on MT metrics. We release both the new dataset and the error analysis annotations for use by the community.