HyMR: a hybrid MapReduce workflow system

  • Authors:
  • Yang Ruan;Zhenhua Guo;Yuduo Zhou;Judy Qiu;Geoffrey Fox

  • Affiliations:
  • Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, IN, USA;Indiana University Bloomington, Bloomington, USA

  • Venue:
  • Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many distributed computing models have been developed for high performance processing of large scale scientific data. Among them, MapReduce is a popular and widely used fine grain parallel runtime. Workflows integrate and coordinate distributed and heterogeneous components to solve the computation problem which may contain several MapReduce jobs. However, existing workflow solutions have limited supports for important features such as fault tolerance and efficient execution for iterative applications. In this paper, we propose HyMR: a hybrid MapReduce workflow system based on two different MapReduce frameworks. HyMR optimizes scheduling for individual jobs and supports fault tolerance for the entire workflow pipeline. A distributed file system is used for fast data sharing between jobs. We compare a pipeline using HyMR with the workflow model based on a single MapReduce framework. Our results show that the hybrid model achieves a higher efficiency.