Experiences in building a next-generation sequencing analysis service using galaxy, globus online and Amazon web service

  • Authors:
  • Ravi K. Madduri;Paul Dave;Dinanath Sulakhe;Lukasz Lacinski;Bo Liu;Ian T. Foster

  • Affiliations:
  • Argonne National Laboratory, Argonne, IL;University of Chicago, Chicago, IL;University of Chicago, Chicago, IL;University of Chicago, Chicago, IL;University of Chicago, Chicago, IL;Argonne National Laboratory, Argonne, IL

  • Venue:
  • Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system is notable for its high degree of end-to-end automation, which encompasses every stage of the data analysis pipeline from initial data access (from remote sequencing center or database, by the Globus Online file transfer system) to on-demand resource acquisition (on Amazon EC2, via the Globus Provision cloud manager); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); and efficient scheduling of these pipelines over many processors (via the Condor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets using just a web browser in a fully automated manner, without software installation.