An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

Authors:
Karl Jiang;Oystein Thorsen;Amanda Peters;Brian Smith;Carlos P. Sosa
Affiliations:
-;-;-;IEEE;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2008

Citing 9
Cited 0

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Global arrays: a nonuniform memory access programming model for high-performance computers

The Journal of Supercomputing
Parallelization of local BLAST service on workstation clusters

Future Generation Computer Systems
TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Implementing Parallel Hmm-pfam on the EARTH Multithreaded Architecture

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Efficient Data Access for Parallel BLAST

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis

IEEE Transactions on Parallel and Distributed Systems
Parallel genomic sequence-searching on an ad-hoc grid: experiences, lessons learned, and implications

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Grid Approach to Embarrassingly Parallel CPU-Intensive Bioinformatics Problems

E-SCIENCE '06 Proceedings of the Second IEEE International Conference on e-Science and Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially.This has popularized programs that carry out database searches. Current implementations of sequence alignmentmethods based on hidden Markov models (HMM) have proven to be computationally intensive, hence, amenable toarchitectures with multiple processors. In this paper we describe a modified version of the original parallelimplementation of hidden Markov models on a massively parallel system. This is part of the HMMER bioinformaticscode. HMMER 2.3.2 uses profile hidden Markov models (HMM) for sensitive database searching based on statisticaldescriptions of sequence family's consensus [1]. Two of the nine programs were further parallelized to take advantage oflarge number of processors, namely, hmmsearch and hmmpfam. For our study, we start by porting the parallel virtualmachine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We reportthe performance of these non-optimized versions as baselines. Our work also includes the introduction of an alternatesequence file indexing, multiple master configuration, dynamic data collection and finally load balancing via the indexedsequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results showparallel performance improvements of more than one order of magnitude (16x) for hmmsearch and hmmpfam.