Idle read after write: IRAW

Authors:
Alma Riska;Erik Riedel
Affiliations:
Seagate Research, Pittsburgh, PA;Seagate Research, Pittsburgh, PA
Venue:
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Year:
2008

Citing 16
Cited 6

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Disk Scrubbing in Large Archival Storage Systems

MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
Idletime scheduling with preemption intervals

Proceedings of the twentieth ACM symposium on Operating systems principles
Long-Range Dependence at the Disk Drive Level

QEST '06 Proceedings of the 3rd international conference on the Quantitative Evaluation of Systems
A fresh look at the reliability of long-term digital storage

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Enhanced Reliability Modeling of RAID Storage Systems

DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
An analysis of latent sector errors in disk drives

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Efficient management of idleness in systems

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Disk drive level workload characterization

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Journaling versus soft updates: asynchronous meta-data protection in file systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Failure trends in a large disk drive population

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you?

ACM Transactions on Storage (TOS)
Improving file system reliability with I/O shepherding

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Parity lost and parity regained

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
An analysis of data corruption in the storage stack

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies

Efficient management of idleness in storage systems

ACM Transactions on Storage (TOS)
Restrained utilization of idleness for transparent scheduling of background tasks

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
On the impact of disk scrubbing on energy savings

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Disk drive workload captured in logs collected during the field return incoming test

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Online availability upgrades for parity-based RAIDs through supplementary parity augmentations

ACM Transactions on Storage (TOS)
Disk Scrubbing Versus Intradisk Redundancy for RAID Storage Systems

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite a low occurrence rate, silent data corruption represents a growing concern for storage systems designers. Throughout the storage hierarchy, from the file system down to the disk drives, various solutions exist to avoid, detect, and correct silent data corruption. Undetected errors during the completion of WRITEs may cause silent data corruption. A portion of the WRITE errors may be detected and corrected successfully by verifying the data written on the disk with the data in the disk cache. Write verification traditionally is scheduled immediately after a WRITE completion (Read After Write - RAW) which is unattractive, because it degrades user performance. To reduce the performance penalty associated with RAW, we propose to retain the written content in the disk cache and verify it once the disk drive becomes idle. Although attractive, this approach (called IRAW - Idle Read After Write) contends for resources, i.e., cache and idle time, with user traffic and other background activities. In this paper, we present a trace-driven evaluation of IRAW and show its feasibility. Our analysis indicates that idleness is present in disk drives and can be utilized for WRITE verification with minimal effect on user performance. IRAW benefits significantly if some amount of cache, i.e., 1 or 2 MB, is dedicated to retain the unverified WRITEs. If the cache is shared with the user requests then a cache retention policy that places both READs and WRITEs upon completion at the most recently used cache segment, yields best IRAW performance without effecting user READs cache hit ratio and overall user performance.