Oozie: towards a scalable workflow management system for Hadoop

  • Authors:
  • Mohammad Islam;Angelo K. Huang;Mohamed Battisha;Michelle Chiang;Santhosh Srinivasan;Craig Peters;Andreas Neumann;Alejandro Abdelnur

  • Affiliations:
  • Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Yahoo! Inc., Sunnyvale, CA;Cloudera Inc., Palo Alto, CA

  • Venue:
  • Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hadoop is a massively scalable parallel computation platform capable of running hundreds of jobs concurrently, and many thousands of jobs per day. Managing all these computations demands for a workflow and scheduling system. In this paper, we identify four indispensable qualities that a Hadoop workflow management system must fulfill namely Scalability, Security, Multi-tenancy, and Operability. We find that conventional workflow management tools lack at least one of these qualities, and therefore present Apache Oozie, a workflow management system specialized for Hadoop. We discuss the architecture of Oozie, share our production experience over the last few years at Yahoo, and evaluate Oozie's scalability and performance.