On the Benefits of aWorkflow-Aware File System in High-Performance Computing Systems

  • Authors:
  • Yang Wang;Paul Lu

  • Affiliations:
  • University of Alberta, Canada;University of Alberta, Canada

  • Venue:
  • HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional high-performance computing (HPC) systems have independent job schedulers and file systems that do not interact in substantial ways. We make the case that some integration of scheduler and file system can have three main benefits. First, the dataflow dependencies between the jobs in a workflow can be inferred by combining the scheduler's knowledge of the jobs (and possibly control-flow) and the file system's knowledge of the files accessed. Second, the dataflow information can be used to improve workflow instance concurrency when there are (potential) filename conflicts. Third, when workflows need to be re-computed, only the affected jobs need to be re-executed. We present the design and a simulation study of the Workflow-Aware File System (WaFS). Our design layers a Namespace Manager (NM) on top of existing file systems to provide, for example, a dataflow engine and a versioned file system. Our simulation study (with a specific set of application parameters) shows that a combined WaFSaware file system and scheduler can significantly improve makespans for intensive workloads and be efficient in the re-computation of jobs.