Re-examining machine translation metrics for paraphrase identification

Authors:
Nitin Madnani;Joel Tetreault;Martin Chodorow
Affiliations:
Educational Testing Service Princeton, NJ;Educational Testing Service Princeton, NJ;Hunter College of CUNY, New York, NY
Venue:
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Year:
2012

Citing 16
Cited 1

Instance-Based Learning Algorithms

Machine Learning
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Improvements to Platt's SMO Algorithm for SVM Classifier Design

Neural Computation
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Paraphrase recognition via dissimilarity significance classification

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Paraphrase identification as probabilistic quasi-synchronous recognition

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

Machine Translation
Extending the meteor machine translation evaluation metric to the phrase level

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
They can help: using crowdsourcing to improve the evaluation of grammatical error detection systems

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Paraphrase identification on the basis of supervised machine learning techniques

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing

Exploiting discourse information to identify paraphrases

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose to re-examine the hypothesis that automated metrics developed for MT evaluation can prove useful for paraphrase identification in light of the significant work on the development of new MT metrics over the last 4 years. We show that a meta-classifier trained using nothing but recent MT metrics outperforms all previous paraphrase identification approaches on the Microsoft Research Paraphrase corpus. In addition, we apply our system to a second corpus developed for the task of plagiarism detection and obtain extremely positive results. Finally, we conduct extensive error analysis and uncover the top systematic sources of error for a paraphrase identification approach relying solely on MT metrics. We release both the new dataset and the error analysis annotations for use by the community.