Dynamic Data Redistribution for MapReduce Joins

Authors:
Steven Lynden;Yusuke Tanimura;Isao Kojima;Akiyoshi Matono
Affiliations:
-;-;-;-
Venue:
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Year:
2011

Citing 0
Cited 1

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce has become a popular method for data processing, in particular for large scale datasets, due to its accessibility as a scalable yet convenient programming paradigm. Data processing tasks often involve joins, and the repartition and fragment-replicate joins are two widely-used join algorithms utilised within the MapReduce framework. This paper presents a multi-join supporting tuple redistribution, building on both the repartition and fragment-replicate joins. Hadoop is used to demonstrate how reduce tasks may improve performance by passing intermediate results to other reduce tasks that are better able to process them using Apache ZooKeeper as a means of communication and data transfer. A performance analysis is presented showing the technique has the potential to reduce response times when processing multiple joins in single MapReduce jobs.