Reimplementing the Cedar file system using logging and group commit
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A formal approach to recovery by compensating transactions
Proceedings of the sixteenth international conference on Very large databases
File-system development with stackable layers
ACM Transactions on Computer Systems (TOCS) - Special issue on operating systems principles
An integrated congestion management architecture for Internet hosts
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Proceedings of the seventeenth ACM symposium on Operating systems principles
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Using aspectC to improve the modularity of path-specific customization in operating system code
Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Bridging the Information Gap in Storage Protocol Stacks
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Detection of Defective Media in Disks
Proceedings of the IEEE International Workshop on Defect and Fault Tolerance in VLSI Systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Improving the reliability of commodity operating systems
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Commercial Fault Tolerance: A Tale of Two Systems
IEEE Transactions on Dependable and Secure Computing
Reliability and security of RAID storage systems and D2D archives using SATA disk drives
ACM Transactions on Storage (TOS)
Measuring Real-World Data Availability
LISA '01 Proceedings of the 15th USENIX conference on System administration
More Than an Interface---SCSI vs. ATA
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Proceedings of the twentieth ACM symposium on Operating systems principles
Proceedings of the twentieth ACM symposium on Operating systems principles
Awarded Best Paper! -- Row-Diagonal Parity for Double Disk Failure Correction
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Awarded Best Student Paper! -- Improving Storage System Availability with D-GRAID
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Undo for operators: building an undoable e-mail store
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
System support for bandwidth management and content adaptation in internet applications
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Trading capacity for performance in a disk array
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Understanding and dealing with operator mistakes in internet services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
An analysis of latent sector errors in disk drives
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
FiST: a language for stackable file systems
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
EIO: error handling is occasionally correct
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
The case for active block layer extensions
ACM SIGOPS Operating Systems Review
Recovery domains: an organizing principle for recoverable operating systems
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Smoke and mirrors: reflecting files at a geographically remote location without loss of performance
FAST '09 Proccedings of the 7th conference on File and storage technologies
DARC: design and evaluation of an I/O controller for data protection
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Understanding latent sector errors and how to protect against them
ACM Transactions on Storage (TOS)
Using declarative invariants for protecting file-system integrity
PLOS '11 Proceedings of the 6th Workshop on Programming Languages and Operating Systems
Towards reliable storage systems
Towards reliable storage systems
Making the common case the only case with anticipatory memory allocation
ACM Transactions on Storage (TOS)
Recon: verifying file system consistency at runtime
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Recon: Verifying file system consistency at runtime
ACM Transactions on Storage (TOS)
Annotation for automation: rapid generation of file system tools
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Getting real: lessons in transitioning research simulations into hardware systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Checking the integrity of transactional mechanisms
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
We introduce a new reliability infrastructure for file systems called I/O shepherding. I/O shepherding allows a file system developer to craft nuanced reliability policies to detect and recover from a wide range of storage system failures. We incorporate shepherding into the Linux ext3 file system through a set of changes to the consistency management subsystem, layout engine, disk scheduler, and buffer cache. The resulting file system, CrookFS, enables a broad class of policies to be easily and correctly specified. We implement numerous policies, incorporating data protection techniques such as retry, parity, mirrors, checksums, sanity checks, and data structure repairs; even complex policies can be implemented in less than 100 lines of code, confirming the power and simplicity of the shepherding framework. We also demonstrate that shepherding is properly integrated, adding less than 5% overhead to the I/O path.