A path-based approach for web page retrieval

  • Authors:
  • Jian-Qiang Li;Yu Zhao;Hector Garcia-Molina

  • Affiliations:
  • NEC Labs China, Beijing, China 100084;NEC Labs China, Beijing, China 100084;Department of Computer Science, Stanford University, Stanford, USA 94305-9040

  • Venue:
  • World Wide Web
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Use of links to enhance page ranking has been widely studied. The underlying assumption is that links convey recommendations. Although this technique has been used successfully in global web search, it produces poor results for website search, because the majority of the links in a website are used to organize information and convey no recommendations. By distinguishing these two kinds of links, respectively for recommendation and information organization, this paper describes a path-based method for web page ranking. We define the Hierarchical Navigation Path (HNP) as a new resource for improving web search. HNP is composed of multi-step navigation information in visitors' website browsing. It provides indications of the content of the destination page. We first classify the links inside a website. Then, the links for web page organization are exploited to construct the HNPs for each page. Finally, the PathRank algorithm is described for web page retrieval. The experiments show that our approach results in significant improvements over existing solutions.