Efficient and decentralized PageRank approximation in a peer-to-peer web search network

Authors:
Josiane Xavier Parreira;Debora Donato;Sebastian Michel;Gerhard Weikum
Affiliations:
Max-Planck Institute for Computer Science, Saarbrücken, Germany;Universita di Roma "La Sapienza", Roma, Italy;Max-Planck Institute for Computer Science, Saarbrücken, Germany;Max-Planck Institute for Computer Science, Saarbrücken, Germany
Venue:
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Year:
2006

Citing 24
Cited 13

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Matrix analysis and applied linear algebra

Matrix analysis and applied linear algebra
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Analysis of the evolution of peer-to-peer systems

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers

Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
Adaptive on-line page importance computation

WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Link analysis ranking: algorithms, theory, and experiments

ACM Transactions on Internet Technology (TOIT)
Using a Layered Markov Model for Distributed Web Ranking Computation

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
MINERVA: collaborative P2P search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Updating Markov Chains with an Eye on Google's PageRank

SIAM Journal on Matrix Analysis and Applications
Computing pagerank in a distributed internet search system

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A framework for decentralized ranking in web information retrieval

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Wayfinder: navigating and sharing information in a decentralized world

DBISP2P'04 Proceedings of the Second international conference on Databases, Information Systems, and Peer-to-Peer Computing

The database research group at the Max-Planck Institute for Informatics

ACM SIGMOD Record
Size doesn't always matter: exploiting pageRank for query routing in distributed IR

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks
New metrics for reputation management in P2P networks

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
P2P authority analysis for social communities

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Usage-based ranking of distributed XML data

Proceedings of the 2008 ACM symposium on Applied computing
Efficiently Handling Dynamics in Distributed Link Based Authority Analysis

WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Local approximation of pagerank and reverse pagerank

Proceedings of the 17th ACM conference on Information and knowledge management
Personalizing PageRank-based ranking over distributed collections

CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
Efficient search and approximate information filtering in a distributed peer-to-peer environment of digital libraries

DELOS'07 Proceedings of the 1st international conference on Digital libraries: research and development
An Inner-Outer Iteration for Computing PageRank

SIAM Journal on Scientific Computing
Asynchronous distributed power iteration with gossip-based normalization

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Trust and reputation in and across virtual communities

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites. However, none of these advanced methods is suitable for a fully decentralized PR computation in a peer-to-peer (P2P) network with autonomous peers, where each peer can independently crawl Web fragments according to the user's thematic interests. In such a setting the graph fragments that different peers have locally available or know about may arbitrarily overlap among peers, creating additional complexity for the PR computation.This paper presents the JXP algorithm for dynamically and collaboratively computing PR scores of Web pages that are arbitrarily distributed in a P2P network. The algorithm runs at every peer, and it works by combining locally computed PR scores with random meetings among the peers in the network. It is scalable as the number of peers on the network grows, and experiments as well as theoretical arguments show that JXP scores converge to the true PR scores that one would obtain by a centralized computation.