In search of I/O-optimal recovery from disk failures

  • Authors:
  • Osama Khan;Randal Burns;James Park;Cheng Huang

  • Affiliations:
  • Department of Computer Science, Johns Hopkins University;Department of Computer Science, Johns Hopkins University;Department of Electrical Eng. and Comp. Science, University of Tennessee;Microsoft Research

  • Venue:
  • HotStorage'11 Proceedings of the 3rd USENIX conference on Hot topics in storage and file systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of minimizing the I/O needed to recover from disk failures in erasure-coded storage systems. The principal result is an algorithm that finds the optimal I/O recovery from an arbitrary number of disk failures for any XOR-based erasure code. We also describe a family of codes with high-fault tolerance and low recovery I/O, e.g. one instance tolerates up to 11 failures and recovers a lost block in 4 I/Os. While we have determined I/O optimal recovery for any given code, it remains an open problem to identify codes with the best recovery properties. We describe our ongoing efforts toward characterizing space overhead versus recovery I/O tradeoffs and generating codes that realize these bounds.