UKP: computing semantic textual similarity by combining multiple content similarity measures

  • Authors:
  • Daniel Bär;Chris Biemann;Iryna Gurevych;Torsten Zesch

  • Affiliations:
  • Ubiquitous Knowledge Processing Lab (UKP-TUDA), Technische Universität Darmstadt;Ubiquitous Knowledge Processing Lab (UKP-TUDA), Technische Universität Darmstadt;Ubiquitous Knowledge Processing Lab (UKP-TUDA), Technische Universität Darmstadt and Ubiquitous Knowledge Processing Lab (UKP-DIPF) German Institute for Educational Research and Educational I ...;Ubiquitous Knowledge Processing Lab (UKP-TUDA), Technische Universität Darmstadt and Ubiquitous Knowledge Processing Lab (UKP-DIPF) German Institute for Educational Research and Educational I ...

  • Venue:
  • SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.