Maximal clique enumeration for large graphs on hadoop framework

Authors:
Naga Shailaja Dasari;Desh Ranjan;Zubair Mohammad
Affiliations:
Old Dominion University, Norfolk, VA, USA;Old Dominion University, Norfolk, VA, USA;Old Dominion University, Norfolk, VA, USA
Venue:
Proceedings of the first workshop on Parallel programming for analytics applications
Year:
2014

Citing 20
Cited 0

Enumerating all connected maximal common subgraphs in two graphs

Theoretical Computer Science
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Algorithm 235: Random permutation

Communications of the ACM
Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A generic motif discovery algorithm for sequential data

Bioinformatics
The worst-case time complexity for generating all maximal cliques and computational experiments

Theoretical Computer Science - Computing and combinatorics
Community detection in large-scale social networks

Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis
From pull-down data to protein interaction networks and complexes with biological relevance

Bioinformatics
A scalable, parallel algorithm for maximal clique enumeration

Journal of Parallel and Distributed Computing
A Distributed Algorithm to Enumerate All Maximal Cliques in MapReduce

FCST '09 Proceedings of the 2009 Fourth International Conference on Frontier of Computer Science and Technology
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
dMaximalCliques: A Distributed Algorithm for Enumerating All Maximal Cliques and Maximal Clique Distribution

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Detecting and Tracking Community Dynamics in Evolutionary Networks

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Filtering: a method for solving graph problems in MapReduce

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Listing all maximal cliques in large sparse real-world graphs

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Energy efficiency for large-scale MapReduce workloads with significant interactive analysis

Proceedings of the 7th ACM european conference on Computer Systems
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Community-based anomaly detection in evolutionary networks

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maximal clique enumeration (MCE) problem for very large graphs appears in many critical applications such as community detection in social networks, aligning 3D protein sequences, finding motifs in genomic data, identifying co-expressed genes and data analytics in communication networks. It is not unusual to have graphs of billions of nodes and edges in these applications. The MCE problem is NP hard, but a number of algorithms both sequential and parallel have been proposed that work efficiently for real graphs. In addition to the large sizes of the input graphs, the MCE algorithms in general result in large intermediate data making it even more challenging to efficiently process the data. Recently an approach has been proposed, referred to as pbitMCE, which is shown to outperform or perform equally well compared to the existing approaches. The approach uses degeneracy ordering of vertices which plays a vital role in the performance of the algorithm. Degeneracy ordering of vertices can be generated in linear time. However it is challenging to find the degeneracy ordering in a distributed environment as it requires extensive communication between the nodes. In some cases generating the ordering can take a significant amount of time. In such cases a different ordering such as ordering by degree can be a better choice than the degeneracy ordering. In this paper we experimentally study the impact of various ordering of vertices on the performance of an MCE algorithm in the context of mapreduce framework. We present an implementation of pbitMCE using mapreduce that takes a large graph and an ordering of vertices as input and enumerates all the maximal cliques. To support the study, we present the experimental results on various graphs using different orderings. The results show that the degree ordering performs comparable to the degeneracy ordering in most cases while it performs poorer in the case of large graphs.