MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds

Authors:
Radu Tudoran;Alexandru Costan;Gabriel Antoniu
Affiliations:
INRIA Rennes - Bretagne Atlantique, Rennes, France;INRIA Rennes - Bretagne Atlantique, Rennes, France;INRIA Rennes - Bretagne Atlantique, Rennes, France
Venue:
Proceedings of third international workshop on MapReduce and its Applications Date
Year:
2012

Citing 10
Cited 0

MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
MapReduce in the Clouds for Science

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
iMapReduce: A Distributed Computing Framework for Iterative Computation

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
CloudClustering: Toward an Iterative Data Processing Pattern on the Cloud

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure

UCC '11 Proceedings of the 2011 Fourth IEEE International Conference on Utility and Cloud Computing
MapReduce in MPI for Large-scale graph algorithms

Parallel Computing
TomusBlobs: Towards Communication-Efficient Storage for MapReduce Applications in Azure

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the emergence of cloud computing as an alternative to supercomputers to support data intensive applications, MapReduce has arisen as a major programming model for data analysis on clouds. In this context, reduce-intensive algorithms are becoming increasingly useful in applications such as data clustering, classification and mining. However, platforms like MapReduce or Dryad lack built-in support for reduce-intensive workloads. This paper introduces MapIterativeReduce, a framework which 1) extends the MapReduce programming model to better support reduce-intensive applications and 2) substantially improves their efficiency by eliminating the implicit barrier between the Map and the Reduce phase. We evaluated MapIterativeReduce on the Microsoft Azure cloud with synthetic benchmarks and with a real-life application. Compared to state-of-art solutions, our approach reduces the execution times by up to 75%.