Design of an active storage cluster file system for DAG workflows

  • Authors:
  • Patrick Donnelly;Douglas Thain

  • Affiliations:
  • University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN

  • Venue:
  • DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.