ACM Transactions on Computer Systems (TOCS)
A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Fault Injection Experiments Using FIAT
IEEE Transactions on Computers
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
The design and implementation of a log-structured file system
ACM Transactions on Computer Systems (TOCS)
FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults
IEEE Transactions on Software Engineering - Special issue on software reliability
The HP AutoRAID hierarchical storage system
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Practical loss-resilient codes
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
File server scaling with network-attached secure disks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tolerating multiple failures in RAID architectures with optimal storage and uniform declustering
Proceedings of the 24th annual international symposium on Computer architecture
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
End-to-end arguments in system design
ACM Transactions on Computer Systems (TOCS)
Pilot: an operating system for a personal computer
Communications of the ACM
Information and control in gray-box systems
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
The Book of SCSI
Inside Windows NT
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Unifying File System Protection
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Detection of Defective Media in Disks
Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
Performance Evaluation of Exception Handling in I/O Libraries
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Measuring Fault Tolerance with the FTAPE Fault Injection Tool
MMB '95 Proceedings of the 8th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation: Quantitative Evaluation of Computing and Communication Systems
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs
Proceedings of the 31st annual international symposium on Computer architecture
Commercial Fault Tolerance: A Tale of Two Systems
IEEE Transactions on Dependable and Secure Computing
Disk Scrubbing in Large Archival Storage Systems
MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Reliability and security of RAID storage systems and D2D archives using SATA disk drives
ACM Transactions on Storage (TOS)
FS: An In-Kernel Integrity Checker and Intrusion Detection File System
LISA '04 Proceedings of the 18th USENIX conference on System administration
Deconstructing Commodity Storage Clusters
Proceedings of the 32nd annual international symposium on Computer Architecture
Model-Based Failure Analysis of Journaling File Systems
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Semantically-Smart Disk Systems
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
More Than an Interface---SCSI vs. ATA
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
MEMS-based Storage Devices and Standard Disk Interfaces: A Square Peg in a Round Hole?
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Trading capacity for performance in a disk array
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using model checking to find serious file system errors
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Towards availability benchmarks: a case study of software raid systems
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalability in the XFS file system
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Ensuring data integrity in storage: techniques and applications
Proceedings of the 2005 ACM workshop on Storage security and survivability
Semantically-smart disk systems: past, present, and future
ACM SIGMETRICS Performance Evaluation Review - Design, implementation, and performance of storage systems
Limiting trust in the storage stack
Proceedings of the second ACM workshop on Storage security and survivability
Using model checking to find serious file system errors
ACM Transactions on Computer Systems (TOCS)
A fresh look at the reliability of long-term digital storage
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Chunkfs: using divide-and-conquer to improve file system reliability and repair
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
An analysis of latent sector errors in disk drives
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?
ACM Transactions on Storage (TOS)
Zyzzyva: speculative byzantine fault tolerance
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Improving file system reliability with I/O shepherding
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
EXPLODE: a lightweight, general system for finding serious storage system errors
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Exploiting type-awareness in a self-recovering disk
Proceedings of the 2007 ACM workshop on Storage security and survivability
The effects of metadata corruption on nfs
Proceedings of the 2007 ACM workshop on Storage security and survivability
SafeStore: a durable and practical storage system
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Parity lost and parity regained
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
EIO: error handling is occasionally correct
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
A nine year study of file system and storage benchmarking
ACM Transactions on Storage (TOS)
ACM Transactions on Computer Systems (TOCS)
Zyzzyva: speculative Byzantine fault tolerance
Communications of the ACM - Remembering Jim Gray
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
An analysis of data corruption in the storage stack
ACM Transactions on Storage (TOS)
Self-stabilizing device drivers
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Undetected disk errors in RAID arrays
IBM Journal of Research and Development
Generating realistic impressions for file-system benchmarking
FAST '09 Proccedings of the 7th conference on File and storage technologies
A systematic approach to system state restoration during storage controller micro-recovery
FAST '09 Proccedings of the 7th conference on File and storage technologies
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
ACM Transactions on Storage (TOS)
Generating realistic impressions for file-system benchmarking
ACM Transactions on Storage (TOS)
Better I/O through byte-addressable, persistent memory
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Zyzzyva: Speculative Byzantine fault tolerance
ACM Transactions on Computer Systems (TOCS)
Uncovering errors: the cost of detecting silent data corruption
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Why panic()?: improving reliability with restartable file systems
ACM SIGOPS Operating Systems Review
Self-stabilizing device drivers
SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
DARC: design and evaluation of an I/O controller for data protection
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Understanding latent sector errors and how to protect against them
ACM Transactions on Storage (TOS)
Membrane: Operating system support for restartable file systems
ACM Transactions on Storage (TOS)
Keeping bits safe: how hard can it be?
Communications of the ACM
End-to-end data integrity for file systems: a ZFS case study
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Understanding latent sector errors and how to protect against them
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
SQCK: a declarative file system checker
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Tolerating file-system mistakes with EnvyFS
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Rump file systems: kernel code reborn
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Keeping Bits Safe: How Hard Can It Be?
Queue - Storage
Depot: cloud storage with minimal trust
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Mnemosyne: lightweight persistent memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Making the common case the only case with anticipatory memory allocation
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Chunkfs: using divide-and-conquer to improve file system reliability and repair
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
Using declarative invariants for protecting file-system integrity
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
A file is not a file: understanding the I/O behavior of Apple desktop applications
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
PREFAIL: a programmable tool for multiple-failure injection
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Efficient Testing of Recovery Code Using Fault Injection
ACM Transactions on Computer Systems (TOCS)
Depot: Cloud Storage with Minimal Trust
ACM Transactions on Computer Systems (TOCS)
Towards reliable storage systems
Towards reliable storage systems
Making the common case the only case with anticipatory memory allocation
ACM Transactions on Storage (TOS)
Framework for enabling highly available distributed applications for utility computing
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Recon: verifying file system consistency at runtime
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
VM aware journaling: improving journaling file system performance in virtualization environments
Software—Practice & Experience
A File Is Not a File: Understanding the I/O Behavior of Apple Desktop Applications
ACM Transactions on Computer Systems (TOCS)
Recon: Verifying file system consistency at runtime
ACM Transactions on Storage (TOS)
Improving Bandwidth Efficiency for Consistent Multistream Storage
ACM Transactions on Storage (TOS)
Robustness in the Salus scalable block store
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
*-Box: towards reliability and consistency in dropbox-like file synchronization services
HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
A Study of Linux File System Evolution
ACM Transactions on Storage (TOS)
A study of Linux file system evolution
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Getting real: lessons in transitioning research simulations into hardware systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Understanding the robustness of SSDS under power fault
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
ViewBox: integrating local file systems with cloud storage services
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Checking the integrity of transactional mechanisms
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.02 |
Commodity file systems trust disks to either work or fail completely, yet modern disks exhibit more complex failure modes. We suggest a new fail-partial failure model for disks, which incorporates realistic localized faults such as latent sector errors and block corruption. We then develop and apply a novel failure-policy fingerprinting framework, to investigate how commodity file systems react to a range of more realistic disk failures. We classify their failure policies in a new taxonomy that measures their Internal RObustNess (IRON), which includes both failure detection and recovery techniques. We show that commodity file system failure policies are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures. Finally, we design, implement, and evaluate a prototype IRON file system, Linux ixt3, showing that techniques such as in-disk checksumming, replication, and parity greatly enhance file system robustness while incurring minimal time and space overheads.