IDO: intelligent data outsourcing with improved RAID reconstruction performance in large-scale data centers

Authors:
Suzhen Wu;Hong Jiang;Bo Mao
Affiliations:
Computer Science Department, Xiamen University and Department of Computer Science & Engineering, University of Nebraska-Lincoln;Department of Computer Science & Engineering, University of Nebraska-Lincoln;Department of Computer Science & Engineering, University of Nebraska-Lincoln
Venue:
lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques
Year:
2012

Citing 24
Cited 0

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Architectures and algorithms for on-line failure recovery in redundant disk arrays

Distributed and Parallel Databases - Special issue on disk arrays
On-line data reconstruction in redundant disk arrays

On-line data reconstruction in redundant disk arrays
Automatic Recovery from Disk Failure in Continuous-Media Servers

IEEE Transactions on Parallel and Distributed Systems
Energy conservation techniques for disk array-based servers

Proceedings of the 18th annual international conference on Supercomputing
Evaluation of Distributed Recovery in Large-Scale Storage Systems

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Dynamic data reallocation in disk arrays

ACM Transactions on Storage (TOS)
An analysis of latent sector errors in disk drives

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you?

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PARAID: a gear-shifting power-aware RAID

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
PRO: a popularity-based multi-threaded reconstruction optimization for RAID-structured storage systems

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Scalable performance of the Panasas parallel file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Write off-loading: practical power management for enterprise storage

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
MICRO: A Multilevel Caching-Based Reconstruction Optimization for Mobile Storage Systems

IEEE Transactions on Computers
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
FlashVM: virtual memory management on flash

USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Availability-Aware Cache Management with Improved RAID Reconstruction Performance

CSE '10 Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering
Online availability upgrades for parity-based RAIDs through supplementary parity augmentations

ACM Transactions on Storage (TOS)
Improving storage system availability with D-GRAID

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Hystor: making the best use of solid state drives in high performance storage systems

Proceedings of the international conference on Supercomputing
Victim disk first: an asymmetric cache to boost the performance of disk arrays under faulty conditions

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dealing with disk failures has become an increasingly common task for system administrators in the face of high disk failure rates in large-scale data centers consisting of hundreds of thousands of disks. Thus, achieving fast recovery from disk failures in general and high online RAID-reconstruction performance in particular has become crucial. To address the problem, this paper proposes IDO (Intelligent Data Outsourcing), a proactive and zone-based optimization, to significantly improve on-line RAID-reconstruction performance. IDO moves popular data zones that are proactively identified in the normal state to a surrogate set at the onset of reconstruction. Thus, IDO enables most, if not all, user I/O requests to be serviced by the surrogate set instead of the degraded set during reconstruction. Extensive trace-driven experiments on our lightweight prototype implementation of IDO demonstrate that, compared with the existing state-of-the-art reconstruction approaches WorkOut and VDF, IDO simultaneously speeds up the reconstruction time and the average user response time. Moreover, IDO can be extended to improving the performance of other background RAID support tasks, such as re-synchronization, RAID reshape and disk scrubbing.