Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Clusters (invited paper)

Authors:
Kevin T. Pedretti;Thomas L. Casavant;R. C. Braun;Todd E. Scheetz;C. L. Birkett;Chad A. Roberts
Affiliations:
-;-;-;-;-;-
Venue:
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Year:
1999

Citing 1
Cited 6

Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms

TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Parallel genome sequence searching on SupercomputerBlueGene/P

ECS'10/ECCTD'10/ECCOM'10/ECCS'10 Proceedings of the European conference of systems, and European conference of circuits technology and devices, and European conference of communications, and European conference on Computer science
Parallel performance evaluation of sequence nucleotide alignment on the supercomputer BlueGene/P

ECC'11 Proceedings of the 5th European conference on European computing conference
A load-aware data placement policy on cluster file system

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Parallelism in bioinformatics workflows

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Distributed BLAST in a grid computing context

CompLife'05 Proceedings of the First international conference on Computational Life Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes approaches to improving the performance of one of the most common and increasingly important aspects of the Human Genome Project (HGP) - large-volume, batch comparison of DNA sequence data. This basic comparison operation, usually carried out by the well-known BLAST program on one subject sequence against the internationally-available databases of over 3 million target sequences, is already used hundreds of thousands of times each day by researchers around the world. At present, it is still used primarily in single query, or small batch query mode. As the entire sequence of the human genome nears completion, the area of functional genomics, and the use of microarrays of sets of genes, is coming to the fore. These developments will demand ever more efficient means of BLASTing sets of data that will make single processor implementation on powerful workstations infeasible. We describe the three primary parallel components to BLAST. The first is at the sequence-to-sequence comparison level. The second parallelizes a single query across a partitioned and distributed database. And finally, the set of queries themselves are partitioned across a set of servers with replicated or partitioned databases. The three methods may be employed alone or in concert. Our current implementation is described which parallelizes batch requests, and our plans for implementation of the other levels is also described. The results will ultimately be applied to hardware assistance for this soon-to-be primitive computer operation.