Distributed workflow-driven analysis of large-scale biological data using biokepler

  • Authors:
  • Ilkay Altintas

  • Affiliations:
  • University of California, San Diego, La Jolla, CA, USA

  • Venue:
  • Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Next-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges, placing unprecedented demands on traditional single-processor bioinformatics algorithms. Technologies like scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific workflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This invited talk discusses the challenges related to next-generation sequencing data and explains the approaches taken in bioKepler to help with analysis of such data.