Stanford: probabilistic edit distance metrics for STS

Authors:
Mengqiu Wang;Daniel Cer
Affiliations:
Stanford University Stanford, CA;Stanford University Stanford, CA
Venue:
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Year:
2012

Citing 11
Cited 0

On the limited memory BFGS method for large scale optimization

Mathematical Programming: Series A and B
Introduction to algorithms

Introduction to algorithms
Translation with Finite-State Devices

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Probabilistic Finite-State Machines-Part I

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parameter estimation for probabilistic finite-state transducers

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A weighted finite state transducer implementation of the alignment template model for statistical machine translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Quantitative Analysis of Probabilistic Pushdown Automata: Expectations and Variances

LICS '05 Proceedings of the 20th Annual IEEE Symposium on Logic in Computer Science
Measuring machine translation quality as semantic equivalence: A metric based on entailment features

Machine Translation
The approximate swap and mismatch edit distance

Theoretical Computer Science
The PASCAL recognising textual entailment challenge

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment
SPEDE: probabilistic edit distance metrics for MT evaluation

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes Stanford University's submission to SemEval 2012 Semantic Textual Similarity (STS) shared evaluation task. Our proposed metric computes probabilistic edit distance as predictions of semantic similarity. We learn weighted edit distance in a probabilistic finite state machine (pFSM) model, where state transitions correspond to edit operations. While standard edit distance models cannot capture long-distance word swapping or cross alignments, we rectify these shortcomings using a novel pushdown automaton extension of the pFSM model. Our models are trained in a regression framework, and can easily incorporate a rich set of linguistic features. The performance of our edit distance based models is contrasted with an adaptation of the Stanford textual entailment system to the STS task. Our results show that the most advanced edit distance model, pPDA, outperforms our entailment system on all but one of the genres included in the STS task.