Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Size-estimation framework with applications to transitive closure and reachability
Journal of Computer and System Sciences
Fast approximation of centrality
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Distributed Algorithms
External-Memory Breadth-First Search with Sublinear I/O
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Graph evolution: Densification and shrinking diameters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Spatially-decaying aggregation over a network
Journal of Computer and System Sciences
Bottom-k sketches: better and more efficient estimation of aggregates
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Summarizing data using bottom-k sketches
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Estimating the size of the transitive closure in linear time
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Tighter estimation using bottom k sketches
Proceedings of the VLDB Endowment
Finding the diameter in real-world graphs experimentally turning a lower bound into an upper bound
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Injecting uncertainty in graphs for identity obfuscation
Proceedings of the VLDB Endowment
Proceedings of the 3rd Annual ACM Web Science Conference
Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths
Proceedings of the first ACM conference on Online social networks
Call me maybe: understanding nature and risks of sharing mobile numbers on online social networks
Proceedings of the first ACM conference on Online social networks
Hi-index | 0.00 |
The distance for a pair of vertices in a graph G is the length of the shortest path between them. The distance distribution for G specifies how many vertex pairs are at distance h, for all feasible values h. We study three fast randomized algorithms to approximate the distance distribution in large graphs. The Eppstein-Wang (EW) algorithm exploits sampling through a limited (logarithmic) number of Breadth-First Searches (BFSes). The Size-Estimation Framework (SEF) by Cohen employs random ranking and least-element lists to provide several estimators. Finally, the Approximate Neighborhood Function (ANF) algorithm by Palmer, Gibbons, and Faloutsos makes use of the probabilistic counting technique introduced by Flajolet and Martin, in order to estimate the number of distinct elements in a large multiset. We investigate how good is the approximation of the distance distribution, when the three algorithms are run in similar settings. The analysis of ANF derives from the results on the probabilistic counting method, while the one of sef is given by Cohen. For what concerns EW (originally designed for another problem), we extend its simple analysis in order to bound its error with high probability and to show its convergence. We then perform an experimental study on 30 real-world graphs, showing that our implementation of ew combines the accuracy of sef with the performance of ANF.