Data-Intensive Workload Consolidation for the Hadoop Distributed File System

Authors:
Reza Moraveji;Javid Taheri;Mohammad Reza;Nikzad Babaii Rizvandi;Albert Y. Zomaya
Affiliations:
-;-;-;-;-
Venue:
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Year:
2012

Citing 13
Cited 0

Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
An Evaluation of Server Consolidation Workloads for Multi-Core Designs

IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Virtual machine power metering and provisioning

Proceedings of the 1st ACM symposium on Cloud computing
Energy aware consolidation for cloud computing

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Energy-Aware Workload Consolidation on GPU

ICPPW '11 Proceedings of the 2011 40th International Conference on Parallel Processing Workshops
Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Workload consolidation, sharing physicalresources among multiple workloads, is a promisingtechnique to save cost and energy in cluster computingsystems. This paper highlights a number of challengesassociated with workload consolidation for Hadoop, as oneof the current state-of-the-art data-intensive clustercomputing systems. Through a systematic step-by-stepprocedure, we investigate challenges for efficient serverconsolidation in Hadoop environments. To this end, we firstinvestigate the inter-relationship between last level cache(LLC) contention and throughput degradation forconsolidated workloads on a single physical serveremploying Hadoop distributed file system (HDFS). We theninvestigate the general case of consolidation on multiplephysical servers so that their throughput never falls below adesired/predefined utilization level. We use our empiricalresults to model consolidation as a classic two-dimensionalbin packing problem and then design a computationallyefficient greedy algorithm to achieve minimum throughputdegradation on multiple servers. Results are very promisingand show that our greedy approach is able to achieve nearoptimal solutions in all experimented cases.