Maximal clique enumeration for large graphs on hadoop framework
Proceedings of the first workshop on Parallel programming for analytics applications
Hi-index | 0.00 |
Clique detection and analysis is one of the fundamental problems in graph theory. However, as the size of graphs increases (e.g., those of social networks), it becomes difficult to conduct such analysis using existing sequential algorithms due to the computation and memory limitation. In this paper, we present a distributed algorithm, dMaximalCliques, which can obtain clique information from million-node graphs within a few minutes on an 80-node computer cluster. dMaximalCliques is a distributed algorithm for share-nothing systems, such as racks of clusters. We use very large scale real and synthetic graphs in the experimental studies to prove the efficiency of the algorithm. In addition, we propose to use the distribution of the size of maximal cliques in a graph (Maximal Clique Distribution) as a new measure for measuring the structural properties of a graph and for distinguishing different types of graphs. Meanwhile, we find that this distribution can be well fitted by lognormal distribution.