Counting triangles in data streams

Authors:
Luciana S. Buriol;Gereon Frahling;Stefano Leonardi;Alberto Marchetti-Spaccamela;Christian Sohler
Affiliations:
Universidade Federal de Santa Maria, Brazil;University of Paderborn, Germany;Universitá di Roma "La Sapienza", Italy;Universitá di Roma "La Sapienza", Italy;University of Paderborn, Germany
Venue:
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2006

Citing 13
Cited 30

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Uniform hashing in constant time and linear space

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Stochastic models for the Web graph

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Graph indexing: a frequent structure-based approach

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
New streaming algorithms for counting triangles in graphs

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding, counting and listing all triangles in large graphs, an experimental study

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Estimating PageRank on graph streams

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient semi-streaming algorithms for local triangle counting in massive graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DOULION: counting triangles in massive graphs with a coin

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal sampling from sliding windows

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Annotations in Data Streams

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Estimating clustering indexes in data streams

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Efficient algorithms for large-scale local triangle counting

ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering coefficient queries on massive dynamic social networks

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Intractability of min- and max-cut in streaming graphs

Information Processing Letters
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
Estimating PageRank on graph streams

Journal of the ACM (JACM)
Triangle listing in massive networks and its applications

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Structural trend analysis for online social networks

Proceedings of the VLDB Endowment
Approximate counting of cycles in streams

ESA'11 Proceedings of the 19th European conference on Algorithms
Improved sampling for triangle counting with MapReduce

ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Optimal sampling from sliding windows

Journal of Computer and System Sciences
gSketch: on query estimation in graph streams

Proceedings of the VLDB Endowment
Colorful triangle counting and a MapReduce implementation

Information Processing Letters
Graph sketches: sparsification, spanners, and subgraphs

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Counting arbitrary subgraphs in data streams

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part II
Triangle listing in massive networks

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Streaming algorithms measured in terms of the computed quantity

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
On the streaming complexity of computing local clustering coefficients

Proceedings of the sixth ACM international conference on Web search and data mining
A space efficient streaming algorithm for triangle counting using the birthday paradox

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating clustering coefficients and size of social networks via random walk

Proceedings of the 22nd international conference on World Wide Web
How hard is counting triangles in the streaming model?

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Counting and sampling triangles from a graph stream

Proceedings of the VLDB Endowment
Load balanced clustering coefficients

Proceedings of the first workshop on Parallel programming for analytics applications
Order matters! Harnessing a world of orderings for reasoning over massive data

Semantic Web

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|⋅(1+s⋅|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).