Matrix multiplication via arithmetic progressions
Journal of Symbolic Computation - Special issue on computational algebraic complexity
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Structural and algorithmic aspects of massive social networks
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Sampling in dynamic data streams and applications
SCG '05 Proceedings of the twenty-first annual symposium on Computational geometry
Counting triangles in data streams
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Temporal Analysis of the Wikigraph
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
New streaming algorithms for counting triangles in graphs
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding, counting and listing all triangles in large graphs, an experimental study
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Proceedings of the forty-first annual ACM symposium on Theory of computing
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Approximate counting of cycles in streams
ESA'11 Proceedings of the 19th European conference on Algorithms
Streaming and communication complexity of clique approximation
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Counting arbitrary subgraphs in data streams
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part II
On the streaming complexity of computing local clustering coefficients
Proceedings of the sixth ACM international conference on Web search and data mining
Parallel triangle counting in massive streaming graphs
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Counting and sampling triangles from a graph stream
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We present random sampling algorithms that with probability at least 1 - δ compute a (1 ± Ɛ)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges. The space used by our algorithm to estimate the clustering coefficient is inversely related to the clustering coefficient of the network itself. The space used by our algorithm to compute the number K3,3 of bipartite cliques is proportional to the ratio between the number of K1,3 and K3,3 in the graph. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size. Therefore they provide a basic tool to analyze the structure of dense clusters in large graphs and have many applications in the discovery of web communities, the analysis of the structure of large social networks and the probing of frequent patterns in large graphs. We implemented both algorithms and evaluated their performance on networks from different application domains and of different size; The largest instance is a webgraph consisting of more than 135 million nodes and 1 billion edges. Both algorithms compute accurate results in reasonable time on the tested instances.