TakeLab: systems for measuring semantic text similarity

  • Authors:
  • Frane Šarić;Goran Glavaš;Mladen Karan;Jan Šnajder;Bojana Dalbelo Bašić

  • Affiliations:
  • University of Zagreb;University of Zagreb;University of Zagreb;University of Zagreb;University of Zagreb

  • Venue:
  • SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6. Most of the research on semantic similarity of textual content focuses on large documents. However, a fair amount of information is condensed into short text snippets such as social media posts, image captions, and scientific abstracts. We predict the human ratings of sentence similarity using a support vector regression model with multiple features measuring word-overlap similarity and syntax similarity. Out of 89 systems submitted, our two systems ranked in the top 5, for the three overall evaluation metrics used (overall Pearson -- 2nd and 3rd, normalized Pearson -- 1st and 3rd, weighted mean -- 2nd and 5th).