Victim disk first: an asymmetric cache to boost the performance of disk arrays under faulty conditions

Authors:
Shenggang Wan;Qiang Cao;Jianzhong Huang;Siyi Li;Xin Li;Shenghui Zhan;Li Yu;Changsheng Xie;Xubin He
Affiliations:
Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Huazhong University of Science & Technology, Wuhan, China;Electrical & Computer Engineering, Virginia Commonwealth University, Richmond, VA
Venue:
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Year:
2011

Citing 25
Cited 3

A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Data cache management using frequency-based replacement

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An approximate analysis of the LRU and FIFO buffer replacement schemes

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Distributed sparing in disk arrays

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Parity declustering for continuous operation in redundant disk arrays

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An optimality proof of the LRU-K page replacement algorithm

Journal of the ACM (JACM)
On the existence of a spectrum of policies that subsumes the least recently used (LRU) and least frequently used (LFU) policies

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatic Recovery from Disk Failure in Continuous-Media Servers

IEEE Transactions on Parallel and Distributed Systems
LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies

IEEE Transactions on Computers
Performance Analysis of Disk Arrays under Failure

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Second-Level Buffer Cache Management

IEEE Transactions on Parallel and Distributed Systems
Evaluation of Distributed Recovery in Large-Scale Storage Systems

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Making LRU Friendly to Weak Locality Workloads: A Novel Replacement Algorithm to Improve Buffer Cache Performance

IEEE Transactions on Computers
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
DULO: an effective buffer cache management scheme to exploit both temporal and spatial locality

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
PRO: a popularity-based multi-threaded reconstruction optimization for RAID-structured storage systems

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Implementation and Evaluation of a Popularity-Based Reconstruction Optimization Algorithm in Availability-Oriented Disk Arrays

MSST '07 Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
MICRO: A Multilevel Caching-Based Reconstruction Optimization for Mobile Storage Systems

IEEE Transactions on Computers
WorkOut: I/O workload outsourcing for boosting RAID reconstruction performance

FAST '09 Proccedings of the 7th conference on File and storage technologies
Flat XOR-based erasure codes in storage systems: Constructions, efficient recovery, and tradeoffs

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)

SAC: rethinking the cache replacement policy for SSD-based storage systems

Proceedings of the 5th Annual International Systems and Storage Conference
A reliability optimization method for RAID-structured storage systems based on active data migration

Journal of Systems and Software
IDO: intelligent data outsourcing with improved RAID reconstruction performance in large-scale data centers

lisa'12 Proceedings of the 26th international conference on Large Installation System Administration: strategies, tools, and techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

The buffer cache plays an essential role in smoothing the gap between the upper-level computational components and the lower-level storage devices. A good buffer cache management scheme should be beneficial to not only the computational components, but also to the storage components by reducing disk I/Os. Existing cache replacement algorithms are well optimized for disks in normal mode, but inefficient under faulty scenarios, such as a parity-based disk array with faulty disk(s). To address this issue, we propose a novel asymmetric buffer cache replacement strategy, named Victim (or faulty) Disk(s) First (VDF) cache, to improve the reliability and performance of a storage system consisting of a buffer cache and disk arrays. The basic idea is to give higher priority to cache the blocks on the faulty disks when the disk array fails, thus reducing the I/Os directed to the faulty disks. To verify the effectiveness of the VDF cache, we have integrated VDF into two popular cache algorithms LFU and LRU, named VDF-LFU and VDF-LRU, respectively. We have conducted extensive simulations as well as a prototype implementation. The simulation results show that VDF-LFU can reduce disk I/Os to surviving disks by up to 42.3% and VDF-LRU can reduce those by up to 36.2%. Our measurement results also show that VDF-LFU can speed up the online recovery by up to 46.3% under a spare-rebuilding mode with online reconstruction, or improve the maximum system service rate by up to 47.7% under a degraded mode without a reconstruction workload. Similarly, VDF-LRU can speed up the online recovery by up to 34.6%, or improve the system service rate by up to 28.4%.