BLAST++: a tool for BLASTing queries in batches

  • Authors:
  • Hao Wang;Twee-Hee Ong;Beng Chin Ooi;Kian-Lee Tan

  • Affiliations:
  • Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore 117543 and Genome Institute of Singapore, 1 Science Park Road, The Capricorn #05-01, Singapore

  • Venue:
  • APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

BLAST is the standard tool that molecular biologists use to search for sequence similarity in genomic (and protein) databases. It employs a brute force approach of comparing a query sequence against every database sequence - for each pair of the sequences to be matched, BLAST searches for short fixed-length word pairs (seeds) in the sequences and then extends them to higher-scoring regions. To search multiple queries, the basic approach is to run BLAST on each of the queries one at a time. This is clearly inefficient and fails to exploit common subsequences that the collection of queries may share. In this paper, we propose a new genome search tool, BLAST++, that allows multiple, say K, queries to be searched against a database concurrently. The design of BLAST++ is based on our observation that the seed searching step of BLAST is a bottleneck that consumes more than 80% of the total response time! BLAST++ essentially treats a collection of queries as a single virtual query so that the seed searching step needs to be performed only once for common subsequences. We implemented BLAST++ as an extension of the NCBI BLAST, and evaluated its performance. Our study shows that the results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but the single-process version of BLAST++ completes the processing in a much shorter time, about only 25% of the original single-process version of NCBI BLAST.