A scalable, parallel algorithm for maximal clique enumeration

Authors:
Matthew C. Schmidt;Nagiza F. Samatova;Kevin Thomas;Byung-Hoon Park
Affiliations:
Computer Science Department, North Carolina State University, Raleigh, NC 27695, United States and Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, Un ...;Computer Science Department, North Carolina State University, Raleigh, NC 27695, United States and Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, Un ...;Cray, Inc., United States;Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 19
Cited 8

Arboricity and subgraph listing algorithms

SIAM Journal on Computing
DIB—a distributed implementation of backtracking

ACM Transactions on Programming Languages and Systems (TOPLAS)
On generating all maximal independent sets

Information Processing Letters
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
Reverse search for enumeration

Discrete Applied Mathematics - Special volume: first international colloquium on graphs and optimization (GOI), 1992
An Analysis of Some Graph Theoretical Cluster Techniques

Journal of the ACM (JACM)
Corrections to Bierstone's Algorithm for Generating Cliques

Journal of the ACM (JACM)
Enumerating all connected maximal common subgraphs in two graphs

Theoretical Computer Science
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Introduction to Algorithms

Introduction to Algorithms
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Similarities and differences of gene expression in yeast stress conditions

Bioinformatics
The worst-case time complexity for generating all maximal cliques and computational experiments

Theoretical Computer Science - Computing and combinatorics
A Parallel Algorithm for Enumerating All Maximal Cliques in Complex Network

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Automated social hierarchy detection through email network analysis

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
From pull-down data to protein interaction networks and complexes with biological relevance

Bioinformatics
On some clustering techniques

IBM Journal of Research and Development
Derivation of maximal compatibles using Boolean algebra

IBM Journal of Research and Development

On perturbation theory and an algorithm for maximal clique enumeration in uncertain and noisy graphs

Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Theoretical underpinnings for maximal clique enumeration on perturbed graphs

Theoretical Computer Science
Finding maximal cliques in massive networks by H*-graph

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Lessons learned from exploring the backtracking paradigm on the GPU

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Finding maximal cliques in massive networks

ACM Transactions on Database Systems (TODS)
Fast algorithms for maximal clique enumeration with limited memory

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Maximal clique enumeration in finding near neighbourhoods

Transactions on Rough Sets XVI
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of maximal clique enumeration (MCE) is to enumerate all of the maximal cliques in a graph. Once enumerated, maximal cliques are widely used to solve problems in areas such as 3-D protein structure alignment, genome mapping, gene expression analysis, and detection of social hierarchies. Even the most efficient serial MCE algorithms require large amounts of time to enumerate the maximal cliques in networks arising from these problems that contain hundreds, thousands, or larger numbers of vertices. The previous attempts to provide practical solutions to the MCE problem through parallel implementation have had limited success, largely due to a number of challenges inherent to the nature of the MCE combinatorial search space. On the one hand, MCE algorithms often create a backtracking search tree that has a highly irregular and hard-or-impossible to predict structure; therefore, almost any static decomposition of the search tree by parallel processors results in highly unbalanced processor execution times. On the other hand, the data-intensive nature of the MCE problem often makes naive dynamic load distribution strategies that require extensive data movement prohibitively expensive. As a result, good scaling of the overall execution time of parallel MCE algorithms has been reported for only up to a couple hundred processors. In this paper, we propose a parallel, scalable, and memory-efficient MCE algorithm for distributed and/or shared memory high performance computing architectures, whose runtime scales linearly for thousands of processors on real-world application graphs with hundreds and thousands of nodes. Its scalability and efficiency are attributed to the proposed: (a) representation of the search tree decomposition to enable parallelization; (b) parallel depth-first backtracking search to both constrain the search space and minimize memory requirement; (c) least stringent synchronization to minimize data movement; and (d) on-demand work stealing intelligently coupled with work stack splitting to minimize computing elements' idle time. To the best of our knowledge, the proposed parallel MCE algorithm is the first to achieve a linear scaling runtime using up to 2048 processors on Cray XT machines for a number of real-world biological networks.