WordRank-Based lexical signatures for finding lost or related web pages

  • Authors:
  • Xiaojun Wan;Jianwu Yang

  • Affiliations:
  • Institute of Computer Science and Technology, Peking University, Beijing, China;Institute of Computer Science and Technology, Peking University, Beijing, China

  • Venue:
  • APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A lexical signature of a web page consists of several key words carefully chosen from the web page and is used to generate robust hyperlink to find the web page when its URL fails. In this paper, we propose a novel method based on WordRank to compute lexical signatures, which can take into account the semantic relatedness between words and choose the most representative and salient words as lexical signature. Experiments show that the DF-based lexical signatures are best at uniquely identifying web pages, and hybrid lexical signatures are good candidates for retrieving the desired web pages, while WordRank-based lexical signatures are best for retrieving highly relevant web pages when the desired web page cannot be extracted.