Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System

  • Authors:
  • Rini T. Kaushik;Milind Bhandarkar;Klara Nahrstedt

  • Affiliations:
  • -;-;-

  • Venue:
  • CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a detailed evaluation and sensitivity analysis of an energy-conserving, highly scalable variant of the Hadoop Distributed File System (HDFS) called Green-HDFS. Green HDFS logically divides the servers in a Hadoop cluster into Hot and Cold Zones and relies on insightful data-classification driven energy-conserving data placement to realize guaranteed, substantially long periods(several days) of idleness in a significant subset of servers in the Cold Zone. Detailed lifespan analysis of the files in a large-scale production Hadoop cluster at Yahoo! points at the viability of Green HDFS. Simulation results with real-worldYahoo! HDFS traces show that Green HDFS can achieve 24% energy cost reduction by doing power management in only one top-level tenant directory in the cluster and meets all the scale-down mandates in spite of the unique scale-down challenges present in a Hadoop cluster. If Green HDFS technique is applied to all the Hadoop clusters at Yahoo! (amounting to 38000 servers), $2.1millioncan be saved in energy costs per annum. Sensitivity analysis shows that energy-conservation is minimally sensitive to the thresholds in Green HDFS. Lifespan analysis points out that one-size-fits-all energy-management policies won’tsuffice in a multi-tenant Hadoop Cluster.