Chunkfs: using divide-and-conquer to improve file system reliability and repair

  • Authors:
  • Val Henson;Amit Gud;Arjan van de Ven;Zach Brown

  • Affiliations:
  • Open Source Technology Center, Intel Corporation;Kansas State University;Open Source Technology Center, Intel Corporation;Oracle, Inc.

  • Venue:
  • HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The absolute time required to check and repair a file system is increasing because disk capacities are growing faster than disk bandwidth and seek time remains almost unchanged. At the same time, file system repair is becoming more common, because the per-bit error rate of disks is not dropping as fast as the number of bits per disk is growing, resulting in more errors per disk. With existing file systems, a single corrupted metadata block requires the entire file system to be unmounted, checked, and repaired--a process that takes hours or days to complete, during which time the data is completely unavailable. The resulting "fsck time crunch" is already making file systems only a few terabytes in size impractical to administrate. We propose chunkfs, which divides on-disk file system data into small, individually repairable fault-isolation domains while preserving normal file system semantics.