Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis

Authors:
Tolga Dalman;Tim DöRnemann;Ernst Juhnke;Michael Weitzel;Wolfgang Wiechert;Katharina NöH;Bernd Freisleben
Affiliations:
Institute of Bio- and Geosciences 1: Biotechnology 2, Forschungszentrum Jülich, Wilhelm-Johnen-Straíe, D-52428 Jülich, Germany;Department of Mathematics & Computer Science and Center for Synthetic Microbiology, University of Marburg, Hans-Meerwein-Straííe 3, D-35032 Marburg, Germany;Department of Mathematics & Computer Science and Center for Synthetic Microbiology, University of Marburg, Hans-Meerwein-Straííe 3, D-35032 Marburg, Germany;Institute of Bio- and Geosciences 1: Biotechnology 2, Forschungszentrum Jülich, Wilhelm-Johnen-Straíe, D-52428 Jülich, Germany;Institute of Bio- and Geosciences 1: Biotechnology 2, Forschungszentrum Jülich, Wilhelm-Johnen-Straíe, D-52428 Jülich, Germany;Institute of Bio- and Geosciences 1: Biotechnology 2, Forschungszentrum Jülich, Wilhelm-Johnen-Straíe, D-52428 Jülich, Germany;Department of Mathematics & Computer Science and Center for Synthetic Microbiology, University of Marburg, Hans-Meerwein-Straííe 3, D-35032 Marburg, Germany
Venue:
Future Generation Computer Systems
Year:
2013

Citing 18
Cited 1

Workflows for e-Science: Scientific Workflows for Grids

Workflows for e-Science: Scientific Workflows for Grids
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Composition and Execution of Secure Workflows in WSRF-Grids

CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Amazon S3 for science grids: a viable solution?

DADC '08 Proceedings of the 2008 international workshop on Data-aware distributed computing
Numerical Recipes 3rd Edition: The Art of Scientific Computing

Numerical Recipes 3rd Edition: The Art of Scientific Computing
CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
On-Demand Resource Provisioning for BPEL Workflows Using Amazon's Elastic Compute Cloud

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
The performance of MapReduce: an in-depth study

Proceedings of the VLDB Endowment
LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Metabolic Flux Analysis in the Cloud

ESCIENCE '10 Proceedings of the 2010 IEEE Sixth International Conference on e-Science
Workflows for metabolic flux analysis: data integration and human interaction

ISoLA'10 Proceedings of the 4th international conference on Leveraging applications of formal methods, verification, and validation - Volume Part I
Understanding application-level interoperability: Scaling-out MapReduce over high-performance grids and clouds

Future Generation Computer Systems
Variable-sized map and locality-aware reduce on public-resource grids

Future Generation Computer Systems
Phoenix++: modular MapReduce for shared-memory systems

Proceedings of the second international workshop on MapReduce and its applications
Adapting scientific computing problems to clouds using MapReduce

Future Generation Computer Systems

Cloud MapReduce for particle filter-based data assimilation for wildfire spread simulation

Proceedings of the High Performance Computing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce architectural pattern popularized by Google has successfully been utilized in several scientific applications. Up until now, MapReduce is rarely employed in the field of Systems Biology. We investigate whether a MapReduce approach utilizing on-demand resources from a Cloud is suitable to perform simulation tasks in the area of Metabolic Flux Analysis (MFA). An Amazon ElasticMapReduce Cloud implementation of the parallel, parametric Monte Carlo bootstrap in the context to ^1^3C-MFA is presented. The seamless integration of the application into a service-oriented, BPEL-based scientific workflow framework is shown. A comparison of a straightforward MapReduce implementation using the Hadoop streaming interface on various Amazon ElasticMapReduce instance types and a single CPU core computation approach reveals a speedup of 17 on 64 Amazon cores. I/O operations on many small files within the Reduce step were identified as the limiting step. By exploiting the Hadoop Java API, making use of built-in data types and tuning problem-specific Hadoop parameters, the I/O issues could be resolved. With the revised implementation, a speedup of up to 48 could be achieved on 64 Amazon cores. To investigate the runtimes of a realistic ^1^3C-MFA analysis, 50,000 Monte Carlo samples with a typical metabolic network model have been performed on 20 virtual nodes in 24 h and 23 min with a total cost of $384. Our work demonstrates the possibility to perform scalable Systems Biology applications using Amazon's Cloud MapReduce service.