An OpenMP algorithm and implementation for clustering biological graphs

Authors:
Timothy Chapman;Ananth Kalyanaraman
Affiliations:
University of California Santa Cruz, Santa Cruz, CA, USA;Washington State University, Pullman, WA, USA
Venue:
Proceedings of the first workshop on Irregular applications: architectures and algorithm
Year:
2011

Citing 12
Cited 0

Syntactic clustering of the Web

Selected papers from the sixth international conference on World Wide Web
Min-wise independent permutations

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Greedy approximation algorithms for finding dense components in a graph

APPROX '00 Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization
Finding a Maximum Density Subgraph

Finding a Maximum Density Subgraph
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An efficient parallel approach for identifying protein families in large-scale metagenomic data sets

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Finding Dense Subgraphs with Size Bounds

WAW '09 Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph
Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
On Finding Dense Subgraphs

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
A faster parallel algorithm and efficient multithreaded implementations for evaluating betweenness centrality on massive datasets

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Networks: An Introduction

Networks: An Introduction
Parallel algorithms for large-scale computational metagenomics

Parallel algorithms for large-scale computational metagenomics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph algorithms on parallel architectures present an interesting case study for irregular applications. Among the graph algorithms popular in scientific computing, graph clus tering or community detection has numerous applications in computational biology. However, this operation also poses serious computational challenges because of irregular memory access patterns, large memory requirements, and their dependence on other auxiliary (also irregular) data structures to supplement processing. In this paper, we address the problem of graph clustering on shared memory machines. We present a new OpenMP-based parallel algorithm called pClust-sm, which uses adjacency lists, hash tables and union-find data structures in parallel. The algorithm improves both the asymptotic runtime and memory complexities of a previous serial implementation. Preliminary results show that this algorithm can scale up to 8 threads (cores) of a shared memory machine on a real world metagenomics input graph with 1.2M vertices and 100M edges. More importantly, the new implementation drastically reduces the time to solution from the order of several hours to just over 4 minutes, and in addition, it enhances the problem size reach by at least one order of magnitude.