Three Complementary Approaches to Parallelization of Local BLAST Service on Workstation Clusters (invited paper)

  • Authors:
  • Kevin T. Pedretti;Thomas L. Casavant;R. C. Braun;Todd E. Scheetz;C. L. Birkett;Chad A. Roberts

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes approaches to improving the performance of one of the most common and increasingly important aspects of the Human Genome Project (HGP) - large-volume, batch comparison of DNA sequence data. This basic comparison operation, usually carried out by the well-known BLAST program on one subject sequence against the internationally-available databases of over 3 million target sequences, is already used hundreds of thousands of times each day by researchers around the world. At present, it is still used primarily in single query, or small batch query mode. As the entire sequence of the human genome nears completion, the area of functional genomics, and the use of microarrays of sets of genes, is coming to the fore. These developments will demand ever more efficient means of BLASTing sets of data that will make single processor implementation on powerful workstations infeasible. We describe the three primary parallel components to BLAST. The first is at the sequence-to-sequence comparison level. The second parallelizes a single query across a partitioned and distributed database. And finally, the set of queries themselves are partitioned across a set of servers with replicated or partitioned databases. The three methods may be employed alone or in concert. Our current implementation is described which parallelizes batch requests, and our plans for implementation of the other levels is also described. The results will ultimately be applied to hardware assistance for this soon-to-be primitive computer operation.