Tolerating file-system mistakes with EnvyFS

  • Authors:
  • Lakshmi N. Bairavasundaram;Swaminathan Sundararaman;Andrea C. Arpaci-Dusseau;Remzi H. Arpaci-Dusseau

  • Affiliations:
  • Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison

  • Venue:
  • USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce EnvyFS, an N-version local file system designed to improve reliability in the face of file-system bugs. EnvyFS, implemented as a thin VFS-like layer near the top of the storage stack, replicates file-system metadata and data across existing and diverse commodity file systems (e.g., ext3, ReiserFS, JFS). It uses majority-consensus to operate correctly despite the sometimes faulty behavior of an underlying commodity child file system. Through experimentation, we show EnvyFS is robust to a wide range of failure scenarios, thus delivering on its promise of increased fault tolerance; however, performance and capacity overheads can be significant. To remedy this issue, we introduce SubSIST, a novel single-instance store designed to operate in an N-version environment. In the common case where all child file systems are working properly, SubSIST coalesces most blocks and thus greatly reduces time and space overheads. In the rare case where a child makes a mistake, SubSIST does not propagate the error to other children, and thus preserves the ability of EnvyFS to detect and recover from bugs that affect data reliability. Overall, EnvyFS and SubSIST combine to significantly improve reliability with only modest space and time overheads.