Understanding the robustness of SSDS under power fault

Authors:
Mai Zheng;Joseph Tucek;Feng Qin;Mark Lillibridge
Affiliations:
The Ohio State University;HP Labs;The Ohio State University;HP Labs
Venue:
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Year:
2013

Citing 16
Cited 0

Redundant disk arrays: reliable, parallel secondary storage

Redundant disk arrays: reliable, parallel secondary storage
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
Algorithms and data structures for flash memories

ACM Computing Surveys (CSUR)
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
A design for high-performance flash disks

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
EXPLODE: a lightweight, general system for finding serious storage system errors

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
LAST: locality-aware sector translation for NAND flash memory-based storage systems

ACM SIGOPS Operating Systems Review
Characterizing flash memory: anomalies, observations, and applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Mean time to meaningless: MTTDL, Markov models, and storage system reliability

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Analyzing consistency properties for fun and profit

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Understanding the impact of power loss on flash memory

Proceedings of the 48th Design Automation Conference
The bleak future of NAND flash memory

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Recon: verifying file system consistency at runtime

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Consistency without ordering

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies
Optimizing NAND flash-based SSDs via retention relaxation

FAST'12 Proceedings of the 10th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern storage technology (SSDs, No-SQL databases, commoditized RAID hardware, etc.) bring new reliability challenges to the already complicated storage stack. Among other things, the behavior of these new components during power faults--which happen relatively frequently in data centers--is an important yet mostly ignored issue in this dependability-critical area. Understanding how new storage components behave under power fault is the first step towards designing new robust storage systems. In this paper, we propose a new methodology to expose reliability issues in block devices under power faults. Our framework includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. Applying our testing framework, we test fifteen commodity SSDs from five different vendors using more than three thousand fault injection cycles in total. Our experimental results reveal that thirteen out of the fifteen tested SSD devices exhibit surprising failure behaviors under power faults, including bit corruption, shorn writes, unserializable writes, metadata corruption, and total device failure.