Differential RAID: rethinking RAID for SSD reliability

  • Authors:
  • Mahesh Balakrishnan;Asim Kadav;Vijayan Prabhakaran;Dahlia Malkhi

  • Affiliations:
  • Microsoft Research Silicon Valley, Mountain View, CA, USA;University of Wisconsin, Madison, WI, USA;Microsoft Research Silicon Valley, Mountain View, CA, USA;Microsoft Research Silicon Valley, Mountain View, CA, USA

  • Venue:
  • Proceedings of the 5th European conference on Computer systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

SSDs exhibit very different failure characteristics compared to hard drives. In particular, the Bit Error Rate (BER) of an SSD climbs as it receives more writes. As a result, RAID arrays composed from SSDs are subject to correlated failures. By balancing writes evenly across the array, RAID schemes can wear out devices at similar times. When a device in the array fails towards the end of its lifetime, the high BER of the remaining devices can result in data loss. We propose Diff-RAID, a parity-based redundancy solution that creates an age differential in an array of SSDs. Diff-RAID distributes parity blocks unevenly across the array, leveraging their higher update rate to age devices at different rates. To maintain this age differential when old devices are replaced by new ones, Diff-RAID reshuffles the parity distribution on each drive replacement. We evaluate Diff-RAID's reliability by using real BER data from 12 flash chips on a simulator and show that it is more reliable than RAID-5, in some cases by multiple orders of magnitude. We also evaluate Diff-RAID's performance using a software implementation on a 5-device array of 80 GB Intel X25-M SSDs and show that it offers a trade-off between throughput and reliability.