Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
IEEE Transactions on Computers
Introduction to algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Towards estimation error guarantees for distinct values
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Approximate Query Processing: Taming the TeraBytes
Proceedings of the 27th International Conference on Very Large Data Bases
ANF: a fast and scalable tool for data mining in massive graphs
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
gSpan: Graph-Based Substructure Pattern Mining
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable mining of large disk-based graph databases
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Graph evolution: Densification and shrinking diameters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
On synopses for distinct-value estimation under multiset operations
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
Statistical properties of community structure in large social and information networks
Proceedings of the 17th international conference on World Wide Web
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Weighted graphs and disconnected components: patterns and a generator
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable Tensor Decompositions for Multi-aspect Data Mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Network Science: Theory and Applications
Network Science: Theory and Applications
Scalable graph clustering using stochastic flows: applications to community discovery
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning patterns in the dynamics of biological networks
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparison of approaches to large-scale data analysis
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
PEGASUS: mining peta-scale graphs
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
HyperANF: approximating the neighbourhood function of very large graphs on a budget
Proceedings of the 20th international conference on World wide web
Ultra-fast rumor spreading in social networks
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Managing and mining large graphs: patterns and algorithms
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
OPAvion: mining and visualization in large graphs
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive optimizations of recursive queries in teradata
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Injecting uncertainty in graphs for identity obfuscation
Proceedings of the VLDB Endowment
On computing the diameter of real-world directed (weighted) graphs
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Business Intelligence and Analytics: Research Directions
ACM Transactions on Management Information Systems (TMIS)
Ligra: a lightweight graph processing framework for shared memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Big graph mining: algorithms and discoveries
ACM SIGKDD Explorations Newsletter
Issues in big data testing and benchmarking
Proceedings of the Sixth International Workshop on Testing Database Systems
Hi-index | 0.01 |
Given large, multimillion-node graphs (e.g., Facebook, Web-crawls, etc.), how do they evolve over time? How are they connected? What are the central nodes and the outliers? In this article we define the Radius plot of a graph and show how it can answer these questions. However, computing the Radius plot is prohibitively expensive for graphs reaching the planetary scale. There are two major contributions in this article: (a) We propose HADI (HAdoop DIameter and radii estimator), a carefully designed and fine-tuned algorithm to compute the radii and the diameter of massive graphs, that runs on the top of the Hadoop/MapReduce system, with excellent scale-up on the number of available machines (b) We run HADI on several real world datasets including YahooWeb (6B edges, 1/8 of a Terabyte), one of the largest public graphs ever analyzed. Thanks to HADI, we report fascinating patterns on large networks, like the surprisingly small effective diameter, the multimodal/bimodal shape of the Radius plot, and its palindrome motion over time.