HyMR: a hybrid MapReduce workflow system

Authors:
Yang Ruan;Zhenhua Guo;Yuduo Zhou;Judy Qiu;Geoffrey Fox
Affiliations:
Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, USA
Venue:
Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
Year:
2012

Citing 20
Cited 0

A deterministic annealing approach to clustering

Pattern Recognition Letters
Taverna: a tool for the composition and enactment of bioinformatics workflows

Bioinformatics
TORQUE resource manager

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Parallel Multidimensional Scaling Performance on Multicore Systems

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MRGIS: A MapReduce-Enabled High Performance Workflow System for GIS

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

IEEE Transactions on Parallel and Distributed Systems
CloudWF: A Computational Workflow System for Clouds Based on Hadoop

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Dimension reduction and visualization of large high-dimensional data via interpolation

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Applying Twister to Scientific Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many distributed computing models have been developed for high performance processing of large scale scientific data. Among them, MapReduce is a popular and widely used fine grain parallel runtime. Workflows integrate and coordinate distributed and heterogeneous components to solve the computation problem which may contain several MapReduce jobs. However, existing workflow solutions have limited supports for important features such as fault tolerance and efficient execution for iterative applications. In this paper, we propose HyMR: a hybrid MapReduce workflow system based on two different MapReduce frameworks. HyMR optimizes scheduling for individual jobs and supports fault tolerance for the entire workflow pipeline. A distributed file system is used for fast data sharing between jobs. We compare a pipeline using HyMR with the workflow model based on a single MapReduce framework. Our results show that the hybrid model achieves a higher efficiency.