A Random Walk Framework to Compute Textual Semantic Similarity: A Unified Model for Three Benchmark Tasks

  • Authors:
  • Majid Yazdani;Andrei Popescu-Belis

  • Affiliations:
  • -;-

  • Venue:
  • ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A network of concepts is built from Wikipedia documents using a random walk approach to compute distances between documents. Three algorithms for distance computation are considered: hitting/commute time, personalized page rank, and truncated visiting probability. In parallel, four types of weighted links in the document network are considered: actual hyperlinks, lexical similarity, common category membership, and common template use. The resulting network is used to solve three benchmark semantic tasks – word similarity, paraphrase detection between sentences, and document similarity – by mapping pairs of data to the network, and then computing a distance between these representations. The model reaches state-of-the-art performance on each task, showing that the constructed network is a general, valuable resource for semantic similarity judgments.