MPI framework for parallel searching in large biological databases

Authors:
Dominic Battré;David Sigfredo Angulo
Affiliations:
DePaul University, Chicago, USA;DePaul University, Chicago, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2006

Citing 10
Cited 0

Document Ranking and the Vector-Space Model

IEEE Software
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Master/Slave Computing on the Grid

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
BLAST++: a tool for BLASTing queries in batches

APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Nimrod: a tool for performing parametrised simulations using distributed workstations

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
An Enabling Framework for Master-Worker Applications on the Computational Grid

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Multi-tiered distributed computing platform

PPPJ '03 Proceedings of the 2nd international conference on Principles and practice of programming in Java
Improved Gapped Alignment in BLAST

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient Data Access for Parallel BLAST

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of searching huge biological databases on the scale of at least several gigabytes by utilizing parallel processing. Biological databases storing DNA sequences, protein sequences, or mass spectra are growing exponentially. Searches through these databases consume exponentially growing computational resources as well. We demonstrate herein a general use, MPI based, C++ framework for generically splitting databases amongst several computational nodes. The combined RAM of the nodes working in tandem is often sufficient to keep the entire database in memory, and therefore to search it efficiently without paging to disk. The framework runs as a persistent service, processing all submitted queries. This allows for query reordering and better utilization of the memory. Thereby, we achieve superlinear speedups compared to single processor implementations. We demonstrate the utility and speedup of the framework using a real biological database and an actual searching algorithm for mass spectrometry.