Towards reliable storage systems

Authors:
Andrea C. Arpaci-Dusseau;Remzi H. Arpaci-Dusseau;Haryadi Sudirman Gunawi
Affiliations:
University of Wisconsin-Madison;-;University of Wisconsin-Madison
Venue:
Towards reliable storage systems
Year:
2009

Citing 90
Cited 0

A fast file system for UNIX

ACM Transactions on Computer Systems (TOCS)
Reimplementing the Cedar file system using logging and group commit

SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Fault Injection Experiments Using FIAT

IEEE Transactions on Computers
A formal approach to recovery by compensating transactions

Proceedings of the sixteenth international conference on Very large databases
The C programming language

The C programming language
Redundant disk arrays: reliable, parallel secondary storage

Redundant disk arrays: reliable, parallel secondary storage
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults

IEEE Transactions on Software Engineering - Special issue on software reliability
The HP AutoRAID hierarchical storage system

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Using Z: specification, refinement, and proof

Using Z: specification, refinement, and proof
Practical loss-resilient codes

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering

Proceedings of the 24th annual international symposium on Computer architecture
An integrated congestion management architecture for Internet hosts

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
The Click modular router

Proceedings of the seventeenth ACM symposium on Operating systems principles
Model checking

Model checking
Designing robust Java programs with exceptions

SIGSOFT '00/FSE-8 Proceedings of the 8th ACM SIGSOFT international symposium on Foundations of software engineering: twenty-first century applications
Pilot: an operating system for a personal computer

Communications of the ACM
Dynamic verification of operating system decisions

Communications of the ACM
Pointer analysis: haven't we solved this problem yet?

PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Chord: A scalable peer-to-peer lookup service for internet applications

Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An empirical study of operating systems errors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Using aspectC to improve the modularity of path-specific customization in operating system code

Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering
Alloy: a lightweight object modelling notation

ACM Transactions on Software Engineering and Methodology (TOSEM)
Inside Windows NT

Inside Windows NT
Practical File System Design with the Be File System

Practical File System Design with the Be File System
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Disk Shadowing

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Bridging the Information Gap in Storage Protocol Stacks

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Unifying File System Protection

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Detection of Defective Media in Disks

Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Measuring Fault Tolerance with the FTAPE Fault Injection Tool

MMB '95 Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation: Quantitative Evaluation of Computing and Communication Systems
Error Scope on a Computational Grid: Theory and Practice

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
You Don't Know Jack about Disks

Queue - Storage
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Transforming policies into mechanisms with infokernel

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Improving the reliability of commodity operating systems

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Automatic detection and repair of errors in data structures

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Commercial Fault Tolerance: A Tale of Two Systems

IEEE Transactions on Dependable and Secure Computing
Disk Scrubbing in Large Archival Storage Systems

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Reliability and security of RAID storage systems and D2D archives using SATA disk drives

ACM Transactions on Storage (TOS)
Measuring Real-World Data Availability

LISA '01 Proceedings of the 15th USENIX conference on System administration
FS: An In-Kernel Integrity Checker and Intrusion Detection File System

LISA '04 Proceedings of the 18th USENIX conference on System administration
CMC: a pragmatic approach to model checking real code

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Deconstructing Commodity Storage Clusters

Proceedings of the 32nd annual international symposium on Computer Architecture
Error Propagation Profiling of Operating Systems

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Model-Based Failure Analysis of Journaling File Systems

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Implementing declarative overlays

Proceedings of the twentieth ACM symposium on Operating systems principles
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
FS2: dynamic data replication in free disk space for improving disk performance and energy consumption

Proceedings of the twentieth ACM symposium on Operating systems principles
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Automatically Generating Malicious Disks using Symbolic Execution

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
A fresh look at the reliability of long-term digital storage

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Zodiac: efficient impact analysis for storage area networks

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Making system configuration more declarative

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Checking system rules using system-specific, programmer-written compiler extensions

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
System support for bandwidth management and content adaptation in internet applications

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Trading capacity for performance in a disk array

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Using model checking to find serious file system errors

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Deploying safe user-level network services with icTCP

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
An analysis of latent sector errors in disk drives

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
File system design for an NFS file server appliance

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
A better update policy

USTC'94 Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference - Volume 1
Type-safe disks

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
EXPLODE: a lightweight, general system for finding serious storage system errors

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Metadata update performance in file systems

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Journaling versus soft updates: asynchronous meta-data protection in file systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Towards availability benchmarks: a case study of software raid systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
A five-year study of file-system metadata

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Scalability in the XFS file system

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Under-constrained execution: making automatic code destruction easy and scalable

Proceedings of the 2007 international symposium on Software testing and analysis
Improving file system reliability with I/O shepherding

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Parity lost and parity regained

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
EIO: error handling is occasionally correct

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Towards a next generation data center architecture: scalability and commoditization

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
Error propagation analysis for file systems

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
DRAM errors in the wild: a large-scale field study

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
SQCK: a declarative file system checker

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Linux kernel developer responses to static analysis bug reports

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
More than an interface: scsi vs. ata

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Row-diagonal parity for double disk failure correction

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Designing for disasters

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Chunkfs: using divide-and-conquer to improve file system reliability and repair

HotDep'06 Proceedings of the Second conference on Hot topics in system dependability

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users are storing increasingly massive amounts of data. Storage software complexity is growing. The use of cheap and less reliable hardware is increasing. The combination of these trends presents us with a terrific challenge: How can we promise users that storage systems work robustly in spite of the complex failures that can arise? In the first part of this dissertation, we respond to this question with our analysis of three reliability components present in many modern file systems: the file system checker (fsck), failure detection and recovery policies (failure policy), and journaling. We find that these subsystems are deficient in handling partial disk failures: in the fsck analysis, we find that some repairs are buggy (making the repaired file system more corrupted) and some repairs are missing (leaving some corruptions unattended). In the failure policy analysis, we observe a major problem of diffused fault handling, which causes policies to be inconsistent, buggy, and inflexible to change. In the journaling analysis, we uncover that current journaling frameworks cannot recover from checkpoint write failures, and hence write failures are intentionally ignored. The results of our analysis hint that managing failures is hard (as also hinted by the developer's comment), and hence demand for novel solutions towards building more reliable storage systems. In the second part of this dissertation, we present our solutions to the problems above. First, we re-architect the file systemchecker by introducing SQCK, a robust file systemchecker that employs a declarative query language. By writing hundreds of checks and repairs in a query language (e.g., SQL), the high-level intent of the checker can be specified in a clear and compact manner. We show that SQCK is able to perform the same functionality as the Linux ext2/3 checker with elegant and compact queries. Second, we present EDP, a static analysis tool that shows how error codes flow through file systems and storage drivers. We observe that low-level errors are sometimes lost as they travel through the many layers of the storage subsystem: out of the 9022 function calls through which the analyzed error codes propagate, we find that 1153 calls (13%) do not correctly save the propagated error codes. Our detailed analysis shows that many violations are not corner-case mistakes; the return codes of some functions are consistently ignored. Finally, we present I/O shepherding, a new reliability infrastructure for file systems. With I/O shepherding, the reliability policies of a file system are well-defined, easy to understand, and simple to tailor to environment and workload. As part of this framework, we also introduce chained transactions, a novel and more powerful transactional model for checkpoint recoveries. We show that I/O shepherding enables simple, powerful, and correctly-implemented reliability policies by implementing an increasingly complex set of policies.