An Efficient Parallel Implementation of the Hidden Markov Methods for Genomic Sequence-Search on a Massively Parallel System

  • Authors:
  • Karl Jiang;Oystein Thorsen;Amanda Peters;Brian Smith;Carlos P. Sosa

  • Affiliations:
  • -;-;-;IEEE;IEEE

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bioinformatics databases used for sequence comparison and sequence alignment are growing exponentially.This has popularized programs that carry out database searches. Current implementations of sequence alignmentmethods based on hidden Markov models (HMM) have proven to be computationally intensive, hence, amenable toarchitectures with multiple processors. In this paper we describe a modified version of the original parallelimplementation of hidden Markov models on a massively parallel system. This is part of the HMMER bioinformaticscode. HMMER 2.3.2 uses profile hidden Markov models (HMM) for sensitive database searching based on statisticaldescriptions of sequence family's consensus [1]. Two of the nine programs were further parallelized to take advantage oflarge number of processors, namely, hmmsearch and hmmpfam. For our study, we start by porting the parallel virtualmachine (PVM) versions of these two programs currently available as part of the HMMER suite of programs. We reportthe performance of these non-optimized versions as baselines. Our work also includes the introduction of an alternatesequence file indexing, multiple master configuration, dynamic data collection and finally load balancing via the indexedsequence files. This set of optimizations constitutes our modified version for massively parallel systems. Our results showparallel performance improvements of more than one order of magnitude (16x) for hmmsearch and hmmpfam.