An OpenMP algorithm and implementation for clustering biological graphs

  • Authors:
  • Timothy Chapman;Ananth Kalyanaraman

  • Affiliations:
  • University of California Santa Cruz, Santa Cruz, CA, USA;Washington State University, Pullman, WA, USA

  • Venue:
  • Proceedings of the first workshop on Irregular applications: architectures and algorithm
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graph algorithms on parallel architectures present an interesting case study for irregular applications. Among the graph algorithms popular in scientific computing, graph clus tering or community detection has numerous applications in computational biology. However, this operation also poses serious computational challenges because of irregular memory access patterns, large memory requirements, and their dependence on other auxiliary (also irregular) data structures to supplement processing. In this paper, we address the problem of graph clustering on shared memory machines. We present a new OpenMP-based parallel algorithm called pClust-sm, which uses adjacency lists, hash tables and union-find data structures in parallel. The algorithm improves both the asymptotic runtime and memory complexities of a previous serial implementation. Preliminary results show that this algorithm can scale up to 8 threads (cores) of a shared memory machine on a real world metagenomics input graph with 1.2M vertices and 100M edges. More importantly, the new implementation drastically reduces the time to solution from the order of several hours to just over 4 minutes, and in addition, it enhances the problem size reach by at least one order of magnitude.