Redundant disk arrays: reliable, parallel secondary storage
Redundant disk arrays: reliable, parallel secondary storage
The design and implementation of a log-structured file system
ACM Transactions on Computer Systems (TOCS)
Algorithms and data structures for flash memories
ACM Computing Surveys (CSUR)
Proceedings of the twentieth ACM symposium on Operating systems principles
A design for high-performance flash disks
ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
EXPLODE: a lightweight, general system for finding serious storage system errors
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
LAST: locality-aware sector translation for NAND flash memory-based storage systems
ACM SIGOPS Operating Systems Review
Characterizing flash memory: anomalies, observations, and applications
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Mean time to meaningless: MTTDL, Markov models, and storage system reliability
HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Analyzing consistency properties for fun and profit
Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Understanding the impact of power loss on flash memory
Proceedings of the 48th Design Automation Conference
The bleak future of NAND flash memory
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Recon: verifying file system consistency at runtime
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Optimizing NAND flash-based SSDs via retention relaxation
FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Modern storage technology (SSDs, No-SQL databases, commoditized RAID hardware, etc.) bring new reliability challenges to the already complicated storage stack. Among other things, the behavior of these new components during power faults--which happen relatively frequently in data centers--is an important yet mostly ignored issue in this dependability-critical area. Understanding how new storage components behave under power fault is the first step towards designing new robust storage systems. In this paper, we propose a new methodology to expose reliability issues in block devices under power faults. Our framework includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test fifteen commodity SSDs from five different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.