A systematic approach to system state restoration during storage controller micro-recovery

  • Authors:
  • Sangeetha Seshadri;Lawrence Chiu;Ling Liu

  • Affiliations:
  • Georgia Institute of Technology;IBM Almaden Research Center;Georgia Institute of Technology

  • Venue:
  • FAST '09 Proccedings of the 7th conference on File and storage technologies
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Micro-recovery, or failure recovery at a fine granularity, is a promising approach to improve the recovery time of software for modern storage systems. Instead of stalling the whole system during failure recovery, micro-recovery can facilitate recovery by a single thread while the system continues to run. A key challenge in performing micro-recovery is to be able to perform efficient and effective state restoration while accounting for dynamic dependencies between multiple threads in a highly concurrent environment. We present Log(Lock), a practical and flexible architecture for performing state restoration without re-architecting legacy code. We formally model thread dependencies based on accesses to both shared state and resources. The Log(Lock) execution model tracks dependencies at runtime and captures the failure context through the restoration level. We develop restoration protocols based on recovery points and restoration levels that identify when micro-recovery is possible and the recovery actions that need to be performed for a given failure context. We have implemented Log(Lock) in a real enterprise storage controller. Our experimental evaluation shows that Log(Lock)-enabled micro-recovery is efficient. It imposes