Semantic text similarity using corpus-based word similarity and string similarity

  • Authors:
  • Aminul Islam;Diana Inkpen

  • Affiliations:
  • University of Ottawa, ON, Canada;University of Ottawa, ON, Canada

  • Venue:
  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.