Random sampling with a reservoir
ACM Transactions on Mathematical Software (TOMS)
Matrix multiplication via arithmetic progressions
Journal of Symbolic Computation - Special issue on computational algebraic complexity
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Uniform hashing in constant time and linear space
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Graph indexing: a frequent structure-based approach
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
New streaming algorithms for counting triangles in graphs
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Finding, counting and listing all triangles in large graphs, an experimental study
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Estimating PageRank on graph streams
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient semi-streaming algorithms for local triangle counting in massive graphs
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Estimating clustering indexes in data streams
ESA'07 Proceedings of the 15th annual European conference on Algorithms
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Efficient algorithms for large-scale local triangle counting
ACM Transactions on Knowledge Discovery from Data (TKDD)
Clustering coefficient queries on massive dynamic social networks
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Intractability of min- and max-cut in streaming graphs
Information Processing Letters
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Estimating PageRank on graph streams
Journal of the ACM (JACM)
Triangle listing in massive networks and its applications
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Structural trend analysis for online social networks
Proceedings of the VLDB Endowment
Approximate counting of cycles in streams
ESA'11 Proceedings of the 19th European conference on Algorithms
Improved sampling for triangle counting with MapReduce
ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Optimal sampling from sliding windows
Journal of Computer and System Sciences
gSketch: on query estimation in graph streams
Proceedings of the VLDB Endowment
Colorful triangle counting and a MapReduce implementation
Information Processing Letters
Graph sketches: sparsification, spanners, and subgraphs
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Counting arbitrary subgraphs in data streams
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part II
Triangle listing in massive networks
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
Streaming algorithms measured in terms of the computed quantity
COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics
On the streaming complexity of computing local clustering coefficients
Proceedings of the sixth ACM international conference on Web search and data mining
A space efficient streaming algorithm for triangle counting using the birthday paradox
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating clustering coefficients and size of social networks via random walk
Proceedings of the 22nd international conference on World Wide Web
How hard is counting triangles in the streaming model?
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Counting and sampling triangles from a graph stream
Proceedings of the VLDB Endowment
Load balanced clustering coefficients
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.01 |
We present two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges. Our first algorithm does not make any assumptions on the order of edges in the stream. It uses space that is inversely related to the ratio between the number of triangles and the number of triples with at least one edge in the induced subgraph, and constant expected update time per edge. Our second algorithm is designed for incidence streams (all edges incident to the same vertex appear consecutively). It uses space that is inversely related to the ratio between the number of triangles and length 2 paths in the graph and expected update time O(log|V|⋅(1+s⋅|V|/|E|)), where s is the space requirement of the algorithm. These results significantly improve over previous work [20, 8]. Since the space complexity depends only on the structure of the input graph and not on the number of nodes, our algorithms scale very well with increasing graph size and so they provide a basic tool to analyze the structure of large graphs. They have many applications, for example, in the discovery of Web communities, the computation of clustering and transitivity coefficient, and discovery of frequent patterns in large graphs.We have implemented both algorithms and evaluated their performance on networks from different application domains. The sizes of the considered graphs varied from about 8,000 nodes and 40,000 edges to 135 million nodes and more than 1 billion edges. For both algorithms we run experiments with parameter s=1,000, 10,000, 100,000, 1,000,000 to evaluate running time and approximation guarantee. Both algorithms appear to be time efficient for these sample sizes. The approximation quality of the first algorithm was varying significantly and even for s=1,000,000 we had more than 10% deviation for more than half of the instances. The second algorithm performed much better and even for s=10,000 we had an average deviation of less than 6% (taken over all but the largest instance for which we could not compute the number of triangles exactly).