LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud

Authors:
Shadi Ibrahim;Hai Jin;Lu Lu;Song Wu;Bingsheng He;Li Qi
Affiliations:
-;-;-;-;-;-
Venue:
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Year:
2010

Citing 0
Cited 9

SkewTune: mitigating skew in mapreduce applications

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Maestro: Replica-Aware Map Scheduling for MapReduce

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
SkewTune in action: mitigating skew in MapReduce applications

Proceedings of the VLDB Endowment
Designing good algorithms for MapReduce and beyond

Proceedings of the Third ACM Symposium on Cloud Computing
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis

Future Generation Computer Systems
Bisimulation reduction of big graphs on mapreduce

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Parallel labeling of massive XML data with MapReduce

The Journal of Supercomputing
Balancing reducer workload for skewed data using sampling-based partitioning

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of Partitioning Skew in MapReduce-based system. Our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications experience performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. We develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop-0.18.0. Our experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 40% on different workloads.