Efficiently Handling Dynamics in Distributed Link Based Authority Analysis

Authors:
Josiane Xavier Parreira;Sebastian Michel;Gerhard Weikum
Affiliations:
Max-Planck Institute for Informatics, Saarbrücken, Germany;Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland;Max-Planck Institute for Informatics, Saarbrücken, Germany
Venue:
WISE '08 Proceedings of the 9th international conference on Web Information Systems Engineering
Year:
2008

Citing 23
Cited 0

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
A scalable content-addressable network

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Analysis of the evolution of peer-to-peer systems

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
P-Grid: A Self-Organizing Access Structure for P2P Information Systems

CooplS '01 Proceedings of the 9th International Conference on Cooperative Information Systems
Adaptive on-line page importance computation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Building Low-Diameter P2P Networks

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Gossip-Based Computation of Aggregate Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Link analysis ranking: algorithms, theory, and experiments

ACM Transactions on Internet Technology (TOIT)
Using a Layered Markov Model for Distributed Web Ranking Computation

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Gossip-based aggregation in large dynamic networks

ACM Transactions on Computer Systems (TOCS)
Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Efficient and decentralized PageRank approximation in a peer-to-peer web search network

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The price of validity in dynamic networks

Journal of Computer and System Sciences
Computing pagerank in a distributed internet search system

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The Juxtaposed approximate PageRank method for robust PageRank approximation in a peer-to-peer web search network

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Link based authority analysis is an important tool for ranking resources in social networks and other graphs. Previous work have presented $\mathrm{J^{X}_P}$, a decentralized algorithm for computing PageRank scores. The algorithm is designed to work in distributed systems, such as peer-to-peer (P2P) networks. However, the dynamics of the P2P networks, one if its main characteristics, is currently not handled by the algorithm. This paper shows how to adapt $\mathrm{J^{X}_P}$ to work under network churn. First, we present a distributed algorithm that estimates the number of distinct documents in the network, which is needed in the local computation of the PageRank scores. We then present a method that enables each peer to detect other peers leave and to update its view of the network. We show that the number of stored items in the network can be efficiently estimated, with little overhead on the network traffic. Second, we present an extension of the original $\mathrm{J^{X}_P}$ algorithms that can cope with network and content dynamics. We show by a comprehensive performance analysis the practical usability of our approach. The proposed estimators together with the changes in the core $\mathrm{J^{X}_P}$ components allow for a fast and authority score computation even under heavy churn. We believe that this is the last missing step toward the application of distributed PageRank measures in real-life large-scale applications.