Estimating clustering indexes in data streams

Authors:
Luciana S. Buriol;Gereon Frahling;Stefano Leonardi;Christian Sohler
Affiliations:
Federal University of Rio Grande do Sul, Porto Alegre, Brazil;Google Research, New York;University of Rome "La Sapienza", Rome, Italy;Heinz Nixdorf Institute and University of Paderborn, Paderborn, Germany
Venue:
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Year:
2007

Citing 13
Cited 8

Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Stochastic models for the Web graph

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Structural and algorithmic aspects of massive social networks

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling in dynamic data streams and applications

SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Counting triangles in data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Temporal Analysis of the Wikigraph

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
New streaming algorithms for counting triangles in graphs

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding, counting and listing all triangles in large graphs, an experimental study

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Private coresets

Proceedings of the forty-first annual ACM symposium on Theory of computing
Efficient algorithms for large-scale local triangle counting

ACM Transactions on Knowledge Discovery from Data (TKDD)
Approximate counting of cycles in streams

ESA'11 Proceedings of the 19th European conference on Algorithms
Streaming and communication complexity of clique approximation

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Counting arbitrary subgraphs in data streams

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part II
On the streaming complexity of computing local clustering coefficients

Proceedings of the sixth ACM international conference on Web search and data mining
Parallel triangle counting in massive streaming graphs

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counting and sampling triangles from a graph stream

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present random sampling algorithms that with probability at least 1 - δ compute a (1 ± Ɛ)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.