Operating System Support for Space Allocation in Grid Storage Systems

  • Authors:
  • Douglas Thain

  • Affiliations:
  • University of Notre Dame

  • Venue:
  • GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Shared temporary storage space is often the constraining resource for clusters that serve as execution nodes in wide-area distributed systems. At least one large national-scale computing grid has reported a failure rate of as high as thirty percent of submitted jobs, often due to accidentally filled shared storage spaces. Previous systems have attacked this problem by adding space allocation to the distributed system interface. However, these allocations are not enforced at the filesystem level, and thus unexpected or unaccounted uses of storage may cause the system to fail. By adding an inexpensive allocation mechanism to the operating system, we may improve the robustness of such systems at minimal cost. In this paper, we describe an abstract model of space allocation in the file system and explore three implementations of the model: a user-level library, a recursive loopback filesystem, and a modified kernel filesystem. We evaluate the performance and completeness of these implementations and demonstrate that kernel support is essential to keeping the overhead low. Finally, we demonstrate empirically that a cluster under heavy filesystem load can be made more robust by adding allocations to the filesystem.