Apache Hadoop YARN: yet another resource negotiator

Authors:
Vinod Kumar Vavilapalli;Arun C. Murthy;Chris Douglas;Sharad Agarwal;Mahadev Konar;Robert Evans;Thomas Graves;Jason Lowe;Hitesh Shah;Siddharth Seth;Bikas Saha;Carlo Curino;Owen O'Malley;Sanjay Radia;Benjamin Reed;Eric Baldeschwieler
Affiliations:
hortonworks.com;hortonworks.com;microsoft.com;inmobi.com;hortonworks.com;yahoo-inc.com;yahoo-inc.com;yahoo-inc.com;hortonworks.com;hortonworks.com;hortonworks.com;microsoft.com;hortonworks.com;hortonworks.com;facebook.com;hortonworks.com
Venue:
Proceedings of the 4th annual Symposium on Cloud Computing
Year:
2013

Citing 20
Cited 0

The mythical man-month (anniversary ed.)

The mythical man-month (anniversary ed.)
Core Algorithms of the Maui Scheduler

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
The making of TPC-DS

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A batch scheduler with high level components

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Mesos: a platform for fine-grained resource sharing in the data center

Proceedings of the 8th USENIX conference on Networked systems design and implementation
Managing data transfers in computer clusters with orchestra

Proceedings of the ACM SIGCOMM 2011 conference
Towards predictable datacenter networks

Proceedings of the ACM SIGCOMM 2011 conference
Dynamic virtual clustering with xen and moab

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Oozie: towards a scalable workflow management system for Hadoop

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Omega: flexible, scalable schedulers for large compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
REEF: retainable evaluator execution framework

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The initial design of Apache Hadoop [1] was tightly focused on running massive, MapReduce jobs to process a web crawl. For increasingly diverse companies, Hadoop has become the data and computational agorá---the de facto place where data and computational resources are shared and accessed. This broad adoption and ubiquitous usage has stretched the initial design well beyond its intended target, exposing two key shortcomings: 1) tight coupling of a specific programming model with the resource management infrastructure, forcing developers to abuse the MapReduce programming model, and 2) centralized handling of jobs' control flow, which resulted in endless scalability concerns for the scheduler. In this paper, we summarize the design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN. The new architecture we introduced decouples the programming model from the resource management infrastructure, and delegates many scheduling functions (e.g., task fault-tolerance) to per-application components. We provide experimental evidence demonstrating the improvements we made, confirm improved efficiency by reporting the experience of running YARN on production environments (including 100% of Yahoo! grids), and confirm the flexibility claims by discussing the porting of several programming frameworks onto YARN viz. Dryad, Giraph, Hoya, Hadoop MapReduce, REEF, Spark, Storm, Tez.