Distributed workflow-driven analysis of large-scale biological data using biokepler

Authors:
Ilkay Altintas
Affiliations:
University of California, San Diego, La Jolla, CA, USA
Venue:
Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Year:
2011

Citing 10
Cited 0

Taverna: lessons in creating a workflow environment for the life sciences: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Scientific workflow management and the Kepler system: Research Articles

Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
Introduction and evaluation of Martlet: a scientific workflow language for abstracted parallelisation

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Advanced data flow support for scientific grid workflow applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Data-Intensive Computing in the 21st Century

Computer
CloudBurst

Bioinformatics
A MapReduce-Enabled Scientific Workflow Composition Framework

ICWS '09 Proceedings of the 2009 IEEE International Conference on Web Services
Accelerating Parameter Sweep Workflows by Utilizing Ad-hoc Network Computing Resources: An Ecological Example

SERVICES '09 Proceedings of the 2009 Congress on Services - I
Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems

Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Next-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges, placing unprecedented demands on traditional single-processor bioinformatics algorithms. Technologies like scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific workflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This invited talk discusses the challenges related to next-generation sequencing data and explains the approaches taken in bioKepler to help with analysis of such data.