Implementation of a Scalable Next Generation Sequencing Business Cloud Platform--An Experience Report

  • Authors:
  • Shyam Kumar Doddavula;Madhavi Rani;Santonu Sarkar;Harsh Rajesh Vachhani;Akansha Jain;Mudit Kaushik;Anirban Ghosh

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Life science industry is looking towards new and cost-effective ways to manage and analyze huge amount of genomic data for faster innovation in drug or biologics discovery. To that effect, various alliances among competitive organizations are getting formed, such as the Pistoia Alliance, to collaborate and share a pool of genomic data and build useful search and analysis techniques for the alliance partners. In order to make the development, and management of data and applications cost-effective, a secure cloud computing based platforms are being considered. In this paper we describe an experience report of building such a collaborative platform on Amazon cloud platform. In order to build a scalable genome sequence alignment solution, we have adopted the well-known BLAST framework on Hadoop platform. A major challenge here is that the BLAST executable requires to be ported as it is, and yet the execution needs to scale, as the number of jobs increases, by elastically growing the Hadoop infrastructure. In this paper we proposed a BLAST database partitioning solution to achieve optimal scalability. Our controlled experiment is encouraging, the empirical result shows that the job execution scales with the number of jobs, if the partition sizes are chosen appropriately.