Semantic textual similarity using maximal weighted bipartite graph matching

  • Authors:
  • Sumit Bhagwani;Shrutiranjan Satapathy;Harish Karnick

  • Affiliations:
  • Computer Science and Engineering IIT Kanpur, Kanpur, India;Computer Science and Engineering IIT Kanpur, Kanpur, India;Computer Science and Engineering IIT Kanpur, Kanpur, India

  • Venue:
  • SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The paper aims to come up with a system that examines the degree of semantic equivalence between two sentences. At the core of the paper is the attempt to grade the similarity of two sentences by finding the maximal weighted bipartite match between the tokens of the two sentences. The tokens include single words, or multi-words in case of Named Entitites, adjectivally and numerically modified words. Two token similarity measures are used for the task - WordNet based similarity, and a statistical word similarity measure which overcomes the shortcomings of WordNet based similarity. As part of three systems created for the task, we explore a simple bag of words tokenization scheme, a more careful tokenization scheme which captures named entities, times, dates, monetary entities etc., and finally try to capture context around tokens using grammatical dependencies.