On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures

Authors:
Diana H. P. Low;Bharadwaj Veeravalli;David A. Bader
Affiliations:
Department of Electrical and Computer Engineering, The National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore;Department of Electrical and Computer Engineering, The National University of Singapore, 4 Engineering Drive 3, Singapore 117576, Singapore;College of Computing, Georgia Institute of Technology, Atlanta, Georgia, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2007

Citing 11
Cited 4

Mathematical Methods for DNA Sequences

Mathematical Methods for DNA Sequences
High Performance Computational Methods for Biological Sequence Analysis

High Performance Computational Methods for Biological Sequence Analysis
Scheduling Divisible Loads in Parallel and Distributed Systems

Scheduling Divisible Loads in Parallel and Distributed Systems
Parallel Computation in Biological Sequence Analysis

IEEE Transactions on Parallel and Distributed Systems
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
Ten Reasons to Use Divisible Load Theory

Computer
An Enabling Framework for Master-Worker Applications on the Computational Grid

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Modeling and characterizing parallel computing performance on heterogeneous networks of workstations

SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Divisible Load Theory: A New Paradigm for Load Scheduling in Distributed Systems

Cluster Computing
A case study of high-throughput biological data processing on parallel platforms

Bioinformatics
Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP

Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

Journal of Parallel and Distributed Computing
Parallel Algorithm to Analyze the Brain Signals: Application on Epileptic Spikes

Journal of Medical Systems
Do More Replicas of Object Data Improve the Performance of Cloud Data Centers?

UCC '12 Proceedings of the 2012 IEEE/ACM Fifth International Conference on Utility and Cloud Computing
A data parallel strategy for aligning multiple biological sequences on multi-core computers

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of multiple sequence alignment (MSA) for handling very large number of proteins sequences on mesh-based multiprocessor architectures. As the problem has been conclusively shown to be computationally complex, we employ divisible load paradigm (also, referred to as divisible load theory, DLT) to handle such large number of sequences. We design an efficient computational engine that is capable of conducting MSAs by exploiting the underlying parallelism embedded in the computational steps of multiple sequence algorithms. Specifically, we consider the standard Smith-Waterman (SW) algorithm in our implementation, however, our approach is by no means restrictive to SW class of algorithms alone. The treatment used in this paper is generic to a class of similar dynamic programming problems. Our approach is recursive in the sense that the quality of solutions can be refined continuously till an acceptable level of quality is achieved. After first phase of computation, we design a heuristic scheme that renders the final solution for MSA. We conduct rigorous simulation experiments using several hundreds of homologous protein sequences derived from the Rattus Norvegicus and Mus Musculus databases of olfactory receptors. We quantify the performance based on speed-up metric. We compare our algorithms to serial or single machine processing approaches. We testify our findings by comparing with conventional equal load partitioning (ELP) strategy that is commonly used in the parallel processing literature. Based on our extensive simulation study, we observe that DLT paradigm offers an excellent speed-up characteristics and provides avenues for its use in several other biological sequence processing related problem. This study is a first time attempt in using the DLT paradigm to devise efficient strategies to handle large scale multiple protein sequence alignment problem on mesh-based multiprocessor systems.