Recovery scopes, recovery groups, and fine-grained recovery in enterprise storage controllers with multi-core processors

Authors:
S. Seshadri;L. Liu;L. Chiu
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA;IBM Almaden Research Center, San Jose, CA
Venue:
IBM Journal of Research and Development
Year:
2009

Citing 13
Cited 0

ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Improving storage system availability with D-GRAID

ACM Transactions on Storage (TOS)
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Triage: Performance differentiation for storage systems using adaptive control

ACM Transactions on Storage (TOS)
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Enhancing server availability and security through failure-oblivious computing

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
pClock: an arrival curve based approach for QoS guarantees in shared storage systems

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using Rescue Points to Navigate Software Recovery

SP '07 Proceedings of the 2007 IEEE Symposium on Security and Privacy
Enhancing storage system availability on multi-core architectures with recovery-conscious scheduling

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Architecting Dependable and Secure Systems Using Virtualization

Architecting Dependable Systems V

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we extend a previously published approach to error recovery in enterprise storage controllers with multi-core processors. Our approach first involves the partitioning of the set of tasks in the runtime of the controller software into clusters (recovery scopes) of dependent tasks. Then, these recovery scopes are mapped into a set of recovery groups, on which the scheduling of tasks, both during the recovery process and normal operation, is based. This recovery-aware scheduling (RAS) replaces the performance-based scheduling of the storage controller. Through simulation and benchmark experiments, we find that: 1) the performance of RAS appears to be critically dependent on the values of recovery-related parameters; and 2) our fine-grained recovery approach promises to enhance the storage system availability while keeping the additional overhead, and the resulting degradation in performance, under control.