An efficient MapReduce algorithm for counting triangles in a very large graph

Authors:
Ha-Myung Park;Chin-Wan Chung
Affiliations:
KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 14
Cited 0

Matrix multiplication via arithmetic progressions

Journal of Symbolic Computation - Special issue on computational algebraic complexity
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Efficient semi-streaming algorithms for local triangle counting in massive graphs

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Main-memory triangle computations for very large (sparse (power-law)) graphs

Theoretical Computer Science
DOULION: counting triangles in massive graphs with a coin

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
Triangle listing in massive networks and its applications

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Uncovering social network sybils in the wild

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Finding, counting and listing all triangles in large graphs, an experimental study

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Matrix chain multiplication via multi-way join algorithms in MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Massive graph triangulation

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in internal memory but the algorithms are not scalable for enormous graphs. In recent years, the MapReduce has emerged as a de facto standard framework for processing large data through parallel computing. A MapReduce algorithm was proposed for the problem based on graph partitioning. However, the algorithm redundantly generates a large number of intermediate data that cause network overload and prolong the processing time. In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. The algorithm substantially reduces the duplication by classifying triangles into three types and processing each triangle differently according to its type. In the experiments, we compare the proposed algorithm with recent existing algorithms using both synthetic datasets and real-world datasets that are composed of millions of nodes and billions of edges. The proposed algorithm outperforms other algorithms in most cases. Especially, for a twitter dataset, the proposed algorithm is more than twice as fast as existing MapReduce algorithms. Moreover, the performance gap increases as the graph becomes larger and denser.