Parity lost and parity regained
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Zyzzyva: speculative Byzantine fault tolerance
Communications of the ACM - Remembering Jim Gray
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
Membrane: Operating system support for restartable file systems
ACM Transactions on Storage (TOS)
End-to-end data integrity for file systems: a ZFS case study
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
SQCK: a declarative file system checker
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Tolerating file-system mistakes with EnvyFS
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Towards automatically checking thousands of failures with micro-specifications
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
Depot: cloud storage with minimal trust
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
S2E: a platform for in-vivo multi-path analysis of software systems
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Making the common case the only case with anticipatory memory allocation
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
FATE and DESTINI: a framework for cloud recovery testing
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Depot: Cloud Storage with Minimal Trust
ACM Transactions on Computer Systems (TOCS)
Towards reliable storage systems
Towards reliable storage systems
Making the common case the only case with anticipatory memory allocation
ACM Transactions on Storage (TOS)
The S2E Platform: Design, Implementation, and Applications
ACM Transactions on Computer Systems (TOCS) - Special Issue APLOS 2011
Scalable testing of file system checkers
Proceedings of the 7th ACM european conference on Computer Systems
Recon: verifying file system consistency at runtime
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Towards efficient, portable application-level consistency
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
Ffsck: The Fast File-System Checker
ACM Transactions on Storage (TOS)
A Study of Linux File System Evolution
ACM Transactions on Storage (TOS)
Ffsck: the fast file system checker
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
A study of Linux file system evolution
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
HARDFS: hardening HDFS with selective and lightweight versioning
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Storage systems such as file systems, databases, and RAID systems have a simple, basic contract: you give them data, they do not lose or corrupt it. Often they store the only copy, making its irrevocable loss almost arbitrarily bad. Unfortunately, their code is exceptionally hard to get right, since it must correctly recover from any crash at any program point, no matter how their state was smeared across volatile and persistent memory. This paper describes EXPLODE, a system that makes it easy to systematically check real storage systems for errors. It takes user-written, potentially system-specific checkers and uses them to drive a storage system into tricky corner cases, including crash recovery errors. EXPLODE uses a novel adaptation of ideas from model checking, a comprehensive, heavy-weight formal verification technique, that makes its checking more systematic (and hopefully more effective) than a pure testing approach while being just as lightweight. EXPLODE is effective. It found serious bugs in a broad range of real storage systems (without requiring source code): three version control systems, Berkeley DB, an NFS implementation, ten file systems, a RAID system, and the popular VMware GSX virtual machine. We found bugs in every system we checked, 36 bugs in total, typically with little effort.