Cloud technologies for bioinformatics applications

  • Authors:
  • Xiaohong Qiu;Jaliya Ekanayake;Scott Beason;Thilina Gunarathne;Geoffrey Fox;Roger Barga;Dennis Gannon

  • Affiliations:
  • Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Microsoft Research, Microsoft Corporation, Redmond, WA;Microsoft Research, Microsoft Corporation, Redmond, WA

  • Venue:
  • Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Executing large number of independent tasks or tasks that perform minimal inter-task communication in parallel is a common requirement in many domains. In this paper, we present our experience in applying two new Microsoft technologies Dryad and Azure to three bioinformatics applications. We also compare with traditional MPI and Apache Hadoop MapReduce implementation in one example. The applications are an EST (Expressed Sequence Tag) sequence assembly program, PhyloD statistical package to identify HLA-associated viral evolution, and a pairwise Alu gene alignment application. We give detailed performance discussion on a 768 core Windows HPC Server cluster and an Azure cloud. All the applications start with a "doubly data parallel step" involving independent data chosen from two similar (EST, Alu) or two different databases (PhyloD). There are different structures for final stages in each application.