GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster

  • Authors:
  • Rini T. Kaushik;Milind Bhandarkar

  • Affiliations:
  • The University of Illinois, Urbana-Champaign and Yahoo! Inc.;Yahoo Inc.

  • Venue:
  • HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hadoop Distributed File System (HDFS) presents unique challenges to the existing energy-conservation techniques and makes it hard to scale-down servers. We propose an energy-conserving, hybrid, logical multi-zoned variant of HDFS for managing data-processing intensive, commodity Hadoop cluster. Green HDFS's data-classification-driven data placement allows scale-down by guaranteeing substantially long periods (several days) of idleness in a subset of servers in the datacenter designated as the Cold Zone. These servers are then transitioned to high-energy-saving, inactive power modes. This is done without impacting the performance of the Hot zone as studies have shown that the servers in the data-intensive compute clusters are under-utilized and, hence, opportunities exist for better consolidation of the workload on the Hot Zone. Analysis of the traces of a Yahoo! Hadoop cluster showed significant heterogeneity in the data's access patterns which can be used to guide energy-aware data placement policies. The trace-driven simulation results with three-month-long real-life HDFS traces from a Hadoop cluster at Yahoo! show a 26% energy consumption reduction by doing only Cold zone power management. Analytical cost model projects savings of $14.6 million in 3-year total cost of ownership (TCO) and simulation results extrapolate savings of $2.4 million annually when Green-HDFS technique is applied across all Hadoop clusters (amounting to 38000 servers) at Yahoo.