Data-Intensive Workload Consolidation for the Hadoop Distributed File System

  • Authors:
  • Reza Moraveji;Javid Taheri;Mohammad Reza;Nikzad Babaii Rizvandi;Albert Y. Zomaya

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Workload consolidation, sharing physicalresources among multiple workloads, is a promisingtechnique to save cost and energy in cluster computingsystems. This paper highlights a number of challengesassociated with workload consolidation for Hadoop, as oneof the current state-of-the-art data-intensive clustercomputing systems. Through a systematic step-by-stepprocedure, we investigate challenges for efficient serverconsolidation in Hadoop environments. To this end, we firstinvestigate the inter-relationship between last level cache(LLC) contention and throughput degradation forconsolidated workloads on a single physical serveremploying Hadoop distributed file system (HDFS). We theninvestigate the general case of consolidation on multiplephysical servers so that their throughput never falls below adesired/predefined utilization level. We use our empiricalresults to model consolidation as a classic two-dimensionalbin packing problem and then design a computationallyefficient greedy algorithm to achieve minimum throughputdegradation on multiple servers. Results are very promisingand show that our greedy approach is able to achieve nearoptimal solutions in all experimented cases.