Efficient algorithms for large-scale local triangle counting

Authors:
Luca Becchetti;Paolo Boldi;Carlos Castillo;Aristides Gionis
Affiliations:
“Sapienza” Università di Roma, Rome, Italy;Università degli Studi di Milano, Milan, Italy;Yahoo! Research, Spain, Barcelona Spain;Yahoo! Research, Spain, Barcelona Spain
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2010

Citing 31
Cited 4

Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
Directed triangles in directed graphs

Discrete Mathematics
Min-wise independent permutations (extended abstract)

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
A small approximately min-wise independent family of hash functions

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Computing on data streams

External memory algorithms
External memory algorithms and data structures: dealing with massive data

ACM Computing Surveys (CSUR)
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
An improved data stream algorithm for frequency moments

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Leveraging Social Networks to Fight Spam

Computer
Scaling link-based similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
The indexable web is more than 11.5 billion pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Trading off space for passes in graph streaming problems

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Counting triangles in data streams

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A reference collection for web spam

ACM SIGIR Forum
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient semi-streaming algorithms for local triangle counting in massive graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Main-memory triangle computations for very large (sparse (power-law)) graphs

Theoretical Computer Science
Fast Counting of Triangles in Large Real Networks without Counting: Algorithms and Laws

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Mining Large Networks with Subgraph Counting

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Estimating clustering indexes in data streams

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Finding, counting and listing all triangles in large graphs, an experimental study

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

SpamWatcher: a streaming social network analytic on the IBM wire-speed processor

Proceedings of the 5th ACM international conference on Distributed event-based system
Triangle listing in massive networks

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on the Best of SIGKDD 2011
On the streaming complexity of computing local clustering coefficients

Proceedings of the sixth ACM international conference on Web search and data mining
Estimating clustering coefficients and size of social networks via random walk

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we study the problem of approximate local triangle counting in large graphs. Namely, given a large graph G=(V,E) we want to estimate as accurately as possible the number of triangles incident to every node v∈ V in the graph. We consider the question both for undirected and directed graphs. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first contribution that addresses the problem of approximate local triangle counting with a focus on the efficiency issues arising in massive graphs and that also considers the directed case. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help detect the presence of spamming activity in large-scale Web graphs, as well as to provide useful features for content quality assessment in social networks. For computing the local number of triangles (undirected and directed), we propose two approximation algorithms, which are based on the idea of min-wise independent permutations [Broder et al. 1998]. Our algorithms operate in a semi-streaming fashion, using O(|V|) space in main memory and performing O(log |V|) sequential scans over the edges of the graph. The first algorithm we describe in this article also uses O(|E|) space of external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results on large graphs, demonstrating the practical efficiency of our approach.