Computing pagerank in a distributed internet search system

Authors:
Yuan Wang;David J. DeWitt
Affiliations:
Computer Sciences Department, University of Wisconsin - Madison, Madison, WI;Computer Sciences Department, University of Wisconsin - Madison, Madison, WI
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 15
Cited 30

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Who Links to Whom: Mining Linkage between Web Sites

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Extrapolation methods for accelerating PageRank computations

WWW '03 Proceedings of the 12th international conference on World Wide Web
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and

Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and
Comparing Top k Lists

SIAM Journal on Discrete Mathematics

Page quality: in search of an unbiased web ranking

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DirectoryRank: ordering pages in web directories

Proceedings of the 7th annual ACM international workshop on Web information and data management
Distributed PageRank computation based on iterative aggregation-disaggregation methods

Proceedings of the 14th ACM international conference on Information and knowledge management
Co-authorship networks in the digital library research community

Information Processing and Management: an International Journal - Special issue: Infometrics
Estimating the global pagerank of web communities

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient and decentralized PageRank approximation in a peer-to-peer web search network

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Size doesn't always matter: exploiting pageRank for query routing in distributed IR

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
Computing trusted authority scores in peer-to-peer web search networks

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
More efficient parallel computation of pagerank

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Countering web spam with credibility-based link analysis

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
P2P authority analysis for social communities

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Usage-based ranking of distributed XML data

Proceedings of the 2008 ACM symposium on Applied computing
Efficiently Handling Dynamics in Distributed Link Based Authority Analysis

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Local approximation of pagerank and reverse pagerank

Proceedings of the 17th ACM conference on Information and knowledge management
Robust result merging using sample-based score estimates

ACM Transactions on Information Systems (TOIS)
Co-authorship networks in the digital library research community

Information Processing and Management: an International Journal - Special issue: Infometrics
P-Terse: a peer-to-peer based text retrieval and search system

Proceedings of the 2005 joint Chinese-German conference on Cognitive systems
Personalizing PageRank-based ranking over distributed collections

CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
Distributed ranked search

HiPC'07 Proceedings of the 14th international conference on High performance computing
Learning to recommend product with the content of web page

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Federated Search

Foundations and Trends in Information Retrieval
Distributed calculation of pagerank using strongly connected components

IICS'05 Proceedings of the 5th international conference on Innovative Internet Community Systems
Discriminating biased web manipulations in terms of link oriented measures

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
IQN routing: integrating quality and novelty in P2P querying and ranking

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Towards a common framework for peer-to-peer web retrieval

From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments
A link-based ranking model for services

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I
Hierarchical link analysis for ranking web data

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Efficient parallel computation of pagerank

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Mixture model with multiple centralized retrieval algorithms for result merging in federated search

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing Internet search engines use web crawlers to download data from the Web. Page quality is measured on central servers, where user queries are also processed. This paper argues that using crawlers has a list of disadvantages. Most importantly, crawlers do not scale. Even Google, the leading search engine, indexes less than 1% of the entire Web. This paper proposes a distributed search engine framework, in which every web server answers queries over its own data. Results from multiple web servers will be merged to generate a ranked hyperlink list on the submitting server. This paper presents a series of algorithms that compute PageRank in such framework. The preliminary experiments on a real data set demonstrate that the system achieves comparable accuracy on PageRank vectors to Google's well-known PageRank algorithm and, therefore, high quality of query results.