Energy efficiency for large-scale MapReduce workloads with significant interactive analysis

Authors:
Yanpei Chen;Sara Alspaugh;Dhruba Borthakur;Randy Katz
Affiliations:
University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;Facebook, Menlo Park, CA, USA;University of California, Berkeley, Berkeley, CA, USA
Venue:
Proceedings of the 7th ACM european conference on Computer Systems
Year:
2012

Citing 28
Cited 12

Scheduling a mixed interactive and batch workload on a parallel, shared memory supercomputer

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A simple linear model of demand paging performance

Communications of the ACM
Integrated Fluid and Packet Network Simulations

MASCOTS '02 Proceedings of the 10th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Power and Energy Management for Server Systems

Computer
JouleSort: a balanced energy-efficiency benchmark

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Power provisioning for a warehouse-sized computer

Proceedings of the 34th annual international symposium on Computer architecture
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Accurate on-line prediction of processor and memoryenergy usage under voltage scaling

EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
PowerNap: eliminating server idle power

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Quincy: fair scheduling for distributed computing clusters

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
On the energy (in)efficiency of Hadoop clusters

ACM SIGOPS Operating Systems Review
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Towards characterizing cloud backend workloads: insights from Google compute clusters

ACM SIGMETRICS Performance Evaluation Review
ParaTimer: a progress indicator for MapReduce DAGs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Data warehousing and analytics infrastructure at facebook

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Energy management for MapReduce clusters

Proceedings of the VLDB Endowment
Dremel: interactive analysis of web-scale datasets

Proceedings of the VLDB Endowment
Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Scarlett: coping with skewed content popularity in mapreduce clusters

Proceedings of the sixth conference on Computer systems
Apache hadoop goes realtime at Facebook

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Power management of online data-intensive services

Proceedings of the 38th annual international symposium on Computer architecture
Warehouse-Scale Computing: Entering the Teenage Decade

Proceedings of the 38th annual international symposium on Computer architecture
The Case for Evaluating MapReduce Performance Using Workload Suites

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

Programming your network at run-time for big data applications

Proceedings of the first workshop on Hot topics in software defined networks
Why let resources idle? aggressive cloning of jobs with dolly

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MixApart: decoupled analytics for shared storage systems

HotStorage'12 Proceedings of the 4th USENIX conference on Hot Topics in Storage and File Systems
Handling more data with less cost: taming power peaks in MapReduce clusters

Proceedings of the Asia-Pacific Workshop on Systems
Towards energy-efficient database cluster design

Proceedings of the VLDB Endowment
Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads

Proceedings of the VLDB Endowment
Handling more data with less cost: taming power peaks in mapreduce clusters

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Omega: flexible, scalable schedulers for large compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Effective straggler mitigation: attack of the clones

nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
iPACS: Power-aware covering sets for energy proportionality and performance in data parallel computing clusters

Journal of Parallel and Distributed Computing
Maximal clique enumeration for large graphs on hadoop framework

Proceedings of the first workshop on Parallel programming for analytics applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce workloads have evolved to include increasing amounts of time-sensitive, interactive data analysis; we refer to such workloads as MapReduce with Interactive Analysis (MIA). Such workloads run on large clusters, whose size and cost make energy efficiency a critical concern. Prior works on MapReduce energy efficiency have not yet considered this workload class. Increasing hardware utilization helps improve efficiency, but is challenging to achieve for MIA workloads. These concerns lead us to develop BEEMR (Berkeley Energy Efficient MapReduce), an energy efficient MapReduce workload manager motivated by empirical analysis of real-life MIA traces at Facebook. The key insight is that although MIA clusters host huge data volumes, the interactive jobs operate on a small fraction of the data, and thus can be served by a small pool of dedicated machines; the less time-sensitive jobs can run on the rest of the cluster in a batch fashion. BEEMR achieves 40-50% energy savings under tight design constraints, and represents a first step towards improving energy efficiency for an increasingly important class of datacenter workloads.