Parallel genomic sequence-search on a massively parallel system

Authors:
Oystein Thorsen;Brian Smith;Carlos P. Sosa;Karl Jiang;Heshan Lin;Amanda Peters;Wu-chun Feng
Affiliations:
IBM - Rochester, Rochester, MN;IBM - Rochester, Rochester, MN;University of Minnesota, Minneapolis, MN;IBM - Rochester, Rochester, MN;North Carolina State University, Raleigh, NC;IBM - Rochester, Rochester, MN;Virginia Tech, Blacksburg, VA
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 7
Cited 7

Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
Parallelization of local BLAST service on workstation clusters

Future Generation Computer Systems
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Bio-sequence analysis with cradle's 3SoC™ software scalable system on chip

Proceedings of the 2004 ACM symposium on Applied computing
RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 7 - Volume 08
ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis

IEEE Transactions on Parallel and Distributed Systems
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing

Massively parallel genomic sequence search on the Blue Gene/P architecture

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Protein similarity search with subset seeds on a dedicated reconfigurable hardware

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Parallel genome sequence searching on SupercomputerBlueGene/P

ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
Parallel performance evaluation of sequence nucleotide alignment on the supercomputer BlueGene/P

ECC'11 Proceedings of the 5th European conference on European computing conference
Investigation into scaling I/O bound streaming applications productively with an all-FPGA cluster

Parallel Computing
Parallelization of the spectral deconvolution stage of the proteomic discovery process

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine
pXAlign: A parallel implementation of XAlign

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the life sciences, genomic databases for sequence search have been growing exponentially in size. As a result, faster sequence-search algorithms to search these databases continue to evolve to cope with algorithmic time complexity. The ubiquitous tool for such search is the Basic Local Alignment Search Tool (BLAST) [1] from the National Center for Biotechnology Information (NCBI). Despite continued algorithmic improvements in BLAST, it cannot keep up with the rate at which the database is exponentially increasing in size. Therefore, parallel implement-ations such as mpiBLAST have emerged to address this problem. The performance of such implementations depends on a myriad of factors including algorithmic, architectural, and mapping of the algorithm to the architecture. This paper describes modifications and extensions to a parallel and distributed-memory version of BLAST called mpiBLAST-PIO and how it maps to a massively parallel system, specifically IBM Blue Gene/L (BG/L). The extensions include a virtual file manager, a "multiple master" run-time model, efficient fragment distribution, and intelligent load balancing. In this study, we have shown that our optimized mpiBLAST-PIO on BG/L using a query with 28014 sequences and the NR and NT databases scales to 8192 nodes (two cores per node). The cases tested here are well suited for a massively parallel system.