Combining link and content analysis to estimate semantic similarity

  • Authors:
  • Filippo Menczer

  • Affiliations:
  • Indiana University, Bloomington, IN

  • Venue:
  • Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic associations between pages therefore crucially affects the performance of any search tool. Here I begin to quantitatively analyze the relationship between content, link, and semantic similarity measures across a massive number of Web page pairs. Maps of semantic similarity across textual and link similarity highlight the potential and limitations of lexical and link analysis for relevance approximation, and provide us with a way to study whether and how text and link based measures should be combined.