Improvement of performance of MegaBlast algorithm for DNA sequence alignment

Authors:
Guang-Ming Tan;Lin Xu;Dong-Bo Bu;Sheng-Zhong Feng;Ning-Hui Sun
Affiliations:
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China and Graduate University of Chinese Academy of Sciences, Beijing, P.R. China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China and Graduate University of Chinese Academy of Sciences, Beijing, P.R. China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P.R. China
Venue:
Journal of Computer Science and Technology
Year:
2006

Citing 3
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Scalable Parallel Computing: Technology,Architecture,Programming

Scalable Parallel Computing: Technology,Architecture,Programming
Current Topics in Computational Molecular Biology

Current Topics in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

MegaBlast is one of the most important programs in NCBI BLAST (Basic Local Alignment Search Tool) toolkits. However, MegaBlast is computation and I/O intensive. It consumes a great deal of memory which is proportional to the size of the query sequences set and subject (database) sequences set of product. This paper proposes a new strategy for optimizing MegaBlast. The new strategy exchanges the query and subject sequences sets, and builds a hash table based on new subject sequences. It overlaps I/O with computation, shortens the overall time and reduces the cost of memory, since the memory here is only proportional to the size of subject sequences set. The optimized algorithm is suitable to be parallelized in cluster systems. The parallel algorithm uses query segmentation method. As our experiments shown, the parallel program which is implemented with MPI has fine scalability.