Computing pagerank in a distributed internet search system

  • Authors:
  • Yuan Wang;David J. DeWitt

  • Affiliations:
  • Computer Sciences Department, University of Wisconsin - Madison, Madison, WI;Computer Sciences Department, University of Wisconsin - Madison, Madison, WI

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing Internet search engines use web crawlers to download data from the Web. Page quality is measured on central servers, where user queries are also processed. This paper argues that using crawlers has a list of disadvantages. Most importantly, crawlers do not scale. Even Google, the leading search engine, indexes less than 1% of the entire Web. This paper proposes a distributed search engine framework, in which every web server answers queries over its own data. Results from multiple web servers will be merged to generate a ranked hyperlink list on the submitting server. This paper presents a series of algorithms that compute PageRank in such framework. The preliminary experiments on a real data set demonstrate that the system achieves comparable accuracy on PageRank vectors to Google's well-known PageRank algorithm and, therefore, high quality of query results.