HyperANF: approximating the neighbourhood function of very large graphs on a budget

Authors:
Paolo Boldi;Marco Rosa;Sebastiano Vigna
Affiliations:
Università degli Studi di Milano, Milano, Italy;Università degli Studi di Milano, Milano, Italy;Università degli Studi di Milano, Milano, Italy
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 7
Cited 14

Size-estimation framework with applications to transitive closure and reachability

Journal of Computer and System Sciences
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
ANF: a fast and scalable tool for data mining in massive graphs

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Broadword implementation of rank/select queries

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HADI: Mining Radii of Large Graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)

Robustness of social networks: comparative results based on distance distributions

SocInfo'11 Proceedings of the Third international conference on Social informatics
Injecting uncertainty in graphs for identity obfuscation

Proceedings of the VLDB Endowment
On computing the diameter of real-world directed (weighted) graphs

SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Four degrees of separation

Proceedings of the 3rd Annual ACM Web Science Conference
Impact neighborhood indexing (INI) in diffusion graphs

Proceedings of the 21st ACM international conference on Information and knowledge management
Evolution of social-attribute networks: measurements, modeling, and implications using google+

Proceedings of the 2012 ACM conference on Internet measurement conference
Four Degrees of Separation, Really

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Competition-based networks for expert finding

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
How social network is evolving?: a preliminary study on billion-scale twitter network

Proceedings of the 22nd international conference on World Wide Web companion
Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths

Proceedings of the first ACM conference on Online social networks
Call me maybe: understanding nature and risks of sharing mobile numbers on online social networks

Proceedings of the first ACM conference on Online social networks
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
X-Stream: edge-centric graph processing using streaming partitions

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.01

Visualization

Abstract

The neighbourhood function NG(t) of a graph G gives, for each t ∈ N, the number of pairs of nodes x, y such that y is reachable from x in less that t hops. The neighbourhood function provides a wealth of information about the graph [10] (e.g., it easily allows one to compute its diameter), but it is very expensive to compute it exactly. Recently, the ANF algorithm [10] (approximate neighbourhood function) has been proposed with the purpose of approximating NG(t) on large graphs. We describe a breakthrough improvement over ANF in terms of speed and scalability. Our algorithm, called HyperANF, uses the new HyperLogLog counters [5] and combines them efficiently through broadword programming [8]; our implementation uses talk decomposition to exploit multi-core parallelism. With HyperANF, for the first time we can compute in a few hours the neighbourhood function of graphs with billions of nodes with a small error and good confidence using a standard workstation. Then, we turn to the study of the distribution of the distances between reachable nodes (that can be efficiently approximated by means of HyperANF), and discover the surprising fact that its index of dispersion provides a clear-cut characterisation of proper social networks vs. web graphs. We thus propose the spid (Shortest-Paths Index of Dispersion) of a graph as a new, informative statistics that is able to discriminate between the above two types of graphs. We believe this is the first proposal of a significant new non-local structural index for complex networks whose computation is highly scalable.