A comparison of three algorithms for approximating the distance distribution in real-world graphs

Authors:
Pierluigi Crescenzi;Roberto Grossi;Leonardo Lanzi;Andrea Marino
Affiliations:
Dipartimento di Sistemi e Informatica, Università di Firenze;Dipartimento di Informatica, Università di Pisa;Dipartimento di Sistemi e Informatica, Università di Firenze;Dipartimento di Sistemi e Informatica, Università di Firenze
Venue:
TAPAS'11 Proceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems
Year:
2011

Citing 15
Cited 4

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Query size estimation by adaptive sampling

Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
Fast approximation of centrality

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Distributed Algorithms

Distributed Algorithms
External-Memory Breadth-First Search with Sublinear I/O

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Graph evolution: Densification and shrinking diameters

ACM Transactions on Knowledge Discovery from Data (TKDD)
Spatially-decaying aggregation over a network

Journal of Computer and System Sciences
Bottom-k sketches: better and more efficient estimation of aggregates

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Summarizing data using bottom-k sketches

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Estimating the size of the transitive closure in linear time

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Tighter estimation using bottom k sketches

Proceedings of the VLDB Endowment
Finding the diameter in real-world graphs experimentally turning a lower bound into an upper bound

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I

Injecting uncertainty in graphs for identity obfuscation

Proceedings of the VLDB Endowment
Four degrees of separation

Proceedings of the 3rd Annual ACM Web Science Conference
Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths

Proceedings of the first ACM conference on Online social networks
Call me maybe: understanding nature and risks of sharing mobile numbers on online social networks

Proceedings of the first ACM conference on Online social networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (EW) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (BFSes). The Size-Estimation Framework (SEF) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (ANF) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of ANF derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns EW (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of ANF.