Massively parallel genomic sequence search on the Blue Gene/P architecture

Authors:
Heshan Lin;Pavan Balaji;Ruth Poole;Carlos Sosa;Xiaosong Ma;Wu-chun Feng
Affiliations:
North Carolina State University;Argonne National Laboratory;IBM;University of Minnesota, Minneapolis, MN;North Carolina State University;Virginia Tech
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 18
Cited 6

Input/output characteristics of scalable parallel applications

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
File-Access Characteristics of Parallel Scientific Workloads

IEEE Transactions on Parallel and Distributed Systems
Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
An extended two-phase method for accessing sections of out-of-core arrays

Scientific Programming
Lessons from characterizating the input/output behavior of parallel scientific applications

Performance Evaluation - Special issue on tools for performance evaluation
On implementing MPI-IO portably and with high performance

Proceedings of the sixth workshop on I/O in parallel and distributed systems
Parallel I/O for high performance computing

Parallel I/O for high performance computing
Parallelization of local BLAST service on workstation clusters

Future Generation Computer Systems
Passion: Optimized I/O for Parallel Applications

Computer
GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application

Proceedings of the Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O: Parallel Computation
Data Sieving and Collective I/O in ROMIO

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
I/O Requirements of Scientific Applications: An Evolutionary View

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Efficient Data Access for Parallel BLAST

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis

IEEE Transactions on Parallel and Distributed Systems
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Parallel genomic sequence-search on a massively parallel system

Proceedings of the 4th international conference on Computing frontiers

Parallel genome sequence searching on SupercomputerBlueGene/P

ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
Parallel performance evaluation of sequence nucleotide alignment on the supercomputer BlueGene/P

ECC'11 Proceedings of the 5th European conference on European computing conference
CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

Parallel Computing
Design and analysis of data management in scalable parallel scripting

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Performance modelling of parallel BLAST using Intel and PGI compilers on an infiniband-based HPC cluster

International Journal of Bioinformatics Research and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures. We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem --- sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes --- in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.