Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

Authors:
Veeravalli Bharadwaj;Han Min Wong
Affiliations:
Computer Networks and Distributed Systems (CNDS) Laboratory, Department of Electrical and Computer Engineering, The National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260, Sing ...;Computer Networks and Distributed Systems (CNDS) Laboratory, Department of Electrical and Computer Engineering, The National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260, Sing ...
Venue:
Journal of Parallel and Distributed Computing
Year:
2009

Citing 8
Cited 1

Load partitioning and trade-off study for large matrix-vector computations in multicast bus networks with communication delays

Journal of Parallel and Distributed Computing
Bioinformatics: the machine learning approach

Bioinformatics: the machine learning approach
High Performance Computational Methods for Biological Sequence Analysis

High Performance Computational Methods for Biological Sequence Analysis
Scheduling Divisible Loads in Parallel and Distributed Systems

Scheduling Divisible Loads in Parallel and Distributed Systems
Parallel Computation in Biological Sequence Analysis

IEEE Transactions on Parallel and Distributed Systems
FLASH: A Fast Look-Up Algorithm for String Homology

Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology
On the design of high-performance algorithms for aligning multiple protein sequences on mesh-based multiprocessor architectures

Journal of Parallel and Distributed Computing
Aligning biological sequences on distributed bus networks: a divisible load scheduling approach

IEEE Transactions on Information Technology in Biomedicine

A data parallel strategy for aligning multiple biological sequences on multi-core computers

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank [GenBank, http://www.ncbi.nlm.nih.gov] in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems.