Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Workflows and e-Science: An overview of workflow system features and capabilities
Future Generation Computer Systems
Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Workflow overhead analysis and optimizations
Proceedings of the 6th workshop on Workflows in support of large-scale science
Online workflow management and performance analysis with stampede
Proceedings of the 7th International Conference on Network and Services Management
Apache Hadoop YARN: yet another resource negotiator
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
Hadoop is a massively scalable parallel computation platform capable of running hundreds of jobs concurrently, and many thousands of jobs per day. Managing all these computations demands for a workflow and scheduling system. In this paper, we identify four indispensable qualities that a Hadoop workflow management system must fulfill namely Scalability, Security, Multi-tenancy, and Operability. We find that conventional workflow management tools lack at least one of these qualities, and therefore present Apache Oozie, a workflow management system specialized for Hadoop. We discuss the architecture of Oozie, share our production experience over the last few years at Yahoo, and evaluate Oozie's scalability and performance.