Chunkfs: using divide-and-conquer to improve file system reliability and repair

Authors:
Val Henson;Arjan van de Ven;Amit Gud;Zach Brown
Affiliations:
Open Source Technology Center and Intel Corporation;Open Source Technology Center and Intel Corporation;Kansas State University;Oracle, Inc.
Venue:
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Year:
2006

Citing 7
Cited 3

Taming aggressive replication in the Pangaea wide-area file system

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
A fresh look at the reliability of long-term digital storage

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Archipelago: an Island-based file system for highly available and scalable internet services

WSS'00 Proceedings of the 4th conference on USENIX Windows Systems Symposium - Volume 4
Fast consistency checking for the Solaris file system

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Soft updates: a technique for eliminating most synchronous writes in the fast filesystem

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference

PRIMS: making NVRAM suitable for extremely reliable storage

HotDep'07 Proceedings of the 3rd workshop on on Hot Topics in System Dependability
SQCK: a declarative file system checker

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Rump file systems: kernel code reborn

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The absolute time required to check and repair a file system is increasing because disk capacities are growing faster than disk bandwidth and seek time remains almost unchanged. At the same time, file system repair is becoming more common, because the per-bit error rate of disks is not dropping as fast as the number of bits per disk is growing, resulting in more errors per disk. With existing file systems, a single corrupted metadata block requires the entire file system to be unmounted, checked, and repaired--a process that takes hours or days to complete, during which time the data is completely unavailable. The resulting "fsck time crunch" is already making file systems only a few terabytes in size impractical administrate. We propose chunkfs, which divides on-disk file system data into small, individually repairable fault-isolation domains while preserving normal file system semantics.