Massively parallel genomic sequence search on the Blue Gene/P architecture

  • Authors:
  • Heshan Lin;Pavan Balaji;Ruth Poole;Carlos Sosa;Xiaosong Ma;Wu-chun Feng

  • Affiliations:
  • North Carolina State University;Argonne National Laboratory;IBM;University of Minnesota, Minneapolis, MN;North Carolina State University;Virginia Tech

  • Venue:
  • Proceedings of the 2008 ACM/IEEE conference on Supercomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures. We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem --- sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes --- in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.