Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
MPI framework for parallel searching in large biological databases
Journal of Parallel and Distributed Computing
Comparing Compressed Sequences for Faster Nucleotide BLAST Searches
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
High parallelism, portability, and broad accessibility: Technologies for genomics
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Homology search with binary and trinary scoring matrices
International Journal of Bioinformatics Research and Applications
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Accelerating BLASTP on the Cell Broadband Engine
PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
High performance protein sequence database scanning on the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Filtering bio-sequence based on sequence descriptor
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Clustering near-identical sequences for fast homology search
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Fast discovery of similar sequences in large genomic collections
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
Homology search is a key tool for understanding the role, structure, and biochemical function of genomic sequences. The most popular technique for rapid homology search is blast, which has been in widespread use within universities, research centers, and commercial enterprises since the early 1990s. In this paper, we propose a new step in the blast algorithm to reduce the computational cost of searching with negligible effect on accuracy. This new step驴semigapped alignment驴compromises between the efficiency of ungapped alignment and the accuracy of gapped alignment, allowing blast to accurately filter sequences with lower computational cost. In addition, we propose a heuristic驴restricted insertion alignment驴that avoids unlikely evolutionary paths with the aim of reducing gapped alignment cost with negligible effect on accuracy. Together, after including an optimization of the local alignment recursion, our two techniques more than double the speed of the gapped alignment stages in blast. We conclude that our techniques are an important improvement to the blast algorithm. Source code for the alignment algorithms is available for download at http://www.bsg.rmit.edu.au/iga/.