Explicit control a batch-aware distributed file system

  • Authors:
  • John Bent;Douglas Thain;Andrea C. Arpaci-Dusseau;Remzi H. Arpaci-Dusseau;Miron Livny

  • Affiliations:
  • Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison;Computer Science Department, University of Wisconsin, Madison

  • Venue:
  • NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer that exposes control of traditionally fixed policies such as caching, consistency, and replication; and a scheduler that exploits this control as necessary for different workloads. By extracting control from the storage layer and placing it within an external scheduler, BAD-FS manages both storage and computation in a coordinated way while gracefully dealing with cache consistency, fault-tolerance, and space management issues in a workload-specific manner. Using both microbenchmarks and real workloads, we demonstrate the performance benefits of explicit control, delivering excellent end-to-end performance across the wide-area.