Adapting bioinformatics applications for heterogeneous systems: a case study

  • Authors:
  • Irena Lanc;Peter Bui;Douglas Thain;Scott Emrich

  • Affiliations:
  • University of Notre Dame, Notre Dame, IN, USA;University of Notre Dame, Notre Dame, IN, USA;University of Notre Dame, Notre Dame, IN, USA;University of Notre Dame, Notre Dame, IN, USA

  • Venue:
  • Proceedings of the second international workshop on Emerging computational methods for the life sciences
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The advent of new sequencing technologies has generated extremely large amounts of information. To successfully apply bioinformatics tools to such large datasets, they need to exhibit scalability and ideally elasticity in diverse computing environments. We describe the application of Weaver to the PEMer structural variation detection workflow. Because the original workflow has an intractable sequential running time on large datasets, it also has a batch implementation designed for a shared file system. Using scripts provided by the developers of PEMer, along with the Weaver Python module, the Starch archive generator, and the Makeflow workflow engine, we have refactored PEMer for elastic scaling on personal clouds. Our case study describes the various challenges faced when constructing such a workflow, from dealing with failure detection, to managing dependencies, to handling the quirks of the underlying operating systems. The practice of scaling bioinformatics tools is increasingly commonplace. As such, the hands-on application of refactoring techniques to PEMer can serve as a valuable guide for those looking to reconfigure other bioinformatics software. Significantly, our customized Makeflow framework enabled elastic deployment on a wider variety of systems while substantially reducing wall clock runtimes using hundreds of cores.