BLAST++: a tool for BLASTing queries in batches

Authors:
Hao Wang;Twee-Hee Ong;Beng Chin Ooi;Kian-Lee Tan
Affiliations:
Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore;Department of Computer Science, National University of Singapore 3 Science Drive 2, Singapore 117543 and Genome Institute of Singapore, 1 Science Park Road, The Capricorn #05-01, Singapore
Venue:
APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Year:
2003

Citing 1
Cited 6

Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering

RC-BLAST: Towards a Portable, Cost-Effective Open Source Hardware Implementation

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 7 - Volume 08
An efficient approach for sequence matching in large DNA databases

Journal of Information Science
MPI framework for parallel searching in large biological databases

Journal of Parallel and Distributed Computing
An adaptive multi-policy grid service for biological sequence comparison

Journal of Parallel and Distributed Computing
A practical method for approximate subsequence search in DNA databases

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A novel indexing method for efficient sequence matching in large DNA database environment

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

BLAST is the standard tool that molecular biologists use to search for sequence similarity in genomic (and protein) databases. It employs a brute force approach of comparing a query sequence against every database sequence - for each pair of the sequences to be matched, BLAST searches for short fixed-length word pairs (seeds) in the sequences and then extends them to higher-scoring regions. To search multiple queries, the basic approach is to run BLAST on each of the queries one at a time. This is clearly inefficient and fails to exploit common subsequences that the collection of queries may share. In this paper, we propose a new genome search tool, BLAST++, that allows multiple, say K, queries to be searched against a database concurrently. The design of BLAST++ is based on our observation that the seed searching step of BLAST is a bottleneck that consumes more than 80% of the total response time! BLAST++ essentially treats a collection of queries as a single virtual query so that the seed searching step needs to be performed only once for common subsequences. We implemented BLAST++ as an extension of the NCBI BLAST, and evaluated its performance. Our study shows that the results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but the single-process version of BLAST++ completes the processing in a much shorter time, about only 25% of the original single-process version of NCBI BLAST.