Optimal recovery of single disk failure in RDP code storage systems

  • Authors:
  • Liping Xiang;Yinlong Xu;John C.S. Lui;Qian Chang

  • Affiliations:
  • University of Science and Technology of China, Hefei, China;University of Science and Technology of China, Hefei, China;The Chinese University of Hong Kong, Hong Kong, Hong Kong;University of Science and Technology of China, Hefei, China

  • Venue:
  • Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern storage systems use thousands of inexpensive disks to meet the storage requirement of applications. To enhance the data availability, some form of redundancy is used. For example, conventional RAID-5 systems provide data availability for single disk failure only, while recent advanced coding techniques such as row-diagonal parity (RDP) can provide data availability with up to two disk failures. To reduce the probability of data unavailability, whenever a single disk fails, disk recovery (or rebuild) will be carried out. We show that conventional recovery scheme of RDP code for a single disk failure is inefficient and suboptimal. In this paper, we propose an optimal and efficient disk recovery scheme, Row-Diagonal Optimal Recovery (RDOR), for single disk failure of RDP code that has the following properties: (1) it is read optimal in the sense that it issues the smallest number of disk reads to recover the failed disk; (2) it has the load balancing property that all surviving disks will be subjected to the same amount of additional workload in rebuilding the failed disk. We carefully explore the design state space and theoretically show the optimality of RDOR. We carry out performance evaluation to quantify the merits of RDOR on some widely used disks.