Differential RAID: Rethinking RAID for SSD reliability

Authors:
Mahesh Balakrishnan;Asim Kadav;Vijayan Prabhakaran;Dahlia Malkhi
Affiliations:
Microsoft Research Silicon Valley, La Avenida, Mountain View, CA;University of Wisconsin-Madison;Microsoft Research Silicon Valley, La Avenida, Mountain View, CA;Microsoft Research Silicon Valley, La Avenida, Mountain View, CA
Venue:
ACM Transactions on Storage (TOS)
Year:
2010

Citing 4
Cited 10

Intel® Turbo Memory: Nonvolatile disk caches in the storage hierarchy of mainstream computer systems

ACM Transactions on Storage (TOS)
Migrating server storage to SSDs: analysis of tradeoffs

Proceedings of the 4th ACM European conference on Computer systems
FAWN: a fast array of wimpy nodes

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Characterizing flash memory: anomalies, observations, and applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

The pitfalls of deploying solid-state drive RAIDs

Proceedings of the 4th Annual International Conference on Systems and Storage
An adaptive write buffer management scheme for flash-based SSDs

ACM Transactions on Storage (TOS)
Meta-Cure: a reliability enhancement strategy for metadata in NAND flash memory storage systems

Proceedings of the 49th Annual Design Automation Conference
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Thermal Modeling of Hybrid Storage Clusters

Journal of Signal Processing Systems
A reliability enhancement design under the flash translation layer for MLC-based flash-memory storage systems

ACM Transactions on Embedded Computing Systems (TECS)
An adaptive, low-cost wear-leveling algorithm for multichannel solid-state disks

ACM Transactions on Embedded Computing Systems (TECS)
Triple-A: a Non-SSD based autonomic all-flash array for high performance storage systems

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Sector-Disk (SD) Erasure Codes for Mixed Failure Modes in RAID Systems

ACM Transactions on Storage (TOS)
SD codes: erasure codes designed for how storage systems really fail

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

SSDs exhibit very different failure characteristics compared to hard drives. In particular, the bit error rate (BER) of an SSD climbs as it receives more writes. As a result, RAID arrays composed from SSDs are subject to correlated failures. By balancing writes evenly across the array, RAID schemes can wear out devices at similar times. When a device in the array fails towards the end of its lifetime, the high BER of the remaining devices can result in data loss. We propose Diff-RAID, a parity-based redundancy solution that creates an age differential in an array of SSDs. Diff-RAID distributes parity blocks unevenly across the array, leveraging their higher update rate to age devices at different rates. To maintain this age differential when old devices are replaced by new ones, Diff-RAID reshuffles the parity distribution on each drive replacement. We evaluate Diff-RAID's reliability by using real BER data from 12 flash chips on a simulator and show that it is more reliable than RAID-5, in some cases by multiple orders of magnitude. We also evaluate Diff-RAID's performance using a software implementation on a 5-device array of 80 GB Intel X25-M SSDs and show that it offers a trade-off between throughput and reliability.