Towards availability benchmarks: a case study of software raid systems

Authors:
Aaron Brown;David A. Patterson
Affiliations:
Computer Science Division, University of California at Berkeley, Berkeley, CA;Computer Science Division, University of California at Berkeley, Berkeley, CA
Venue:
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Year:
2000

Citing 15
Cited 32

FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults

IEEE Transactions on Software Engineering - Special issue on software reliability
RAID: high-performance, reliable secondary storage

ACM Computing Surveys (CSUR)
Cluster-based scalable network services

Proceedings of the sixteenth ACM symposium on Operating systems principles
The Future of Systems Research

Computer
An approach towards benchmarking of fault-tolerant commercial systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
How Fail-Stop are Faulty Programs?

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A Hierarchical Approach for Dependability Analysis of a Commercial Cache-Based RAID Storage Architecture

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
The Systematic Improvement of Fault Tolerance in the Rio File Cache

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
ISTORE: Introspective Storage for Data-Intensive Network Services

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Comparing Operating Systems Using Robustness Benchmarks

SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Dependability Analysis of a Cache-Based RAID System via Fast Distributed Simulation

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems

Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
Performance availability for networks of workstations

Performance availability for networks of workstations
Characterizing large storage systems: error behavior and performance benchmarks

Characterizing large storage systems: error behavior and performance benchmarks
Fault injection spot-checks computer system dependability

IEEE Spectrum

ROC-1: Hardware Support for Recovery-Oriented Computing

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Improving cluster availability using workstation validation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Quantifying Network Denial of Service: A Location Service Case Study

ICICS '01 Proceedings of the Third International Conference on Information and Communications Security
From Experimental Assessment of Fault-Tolerant Systems to Dependability Benchmarking

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A characterization of the sensitivity of query optimization to storage access cost parameters

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Joint evaluation of recovery and performance of a COTS DBMS in the presence of operator faults

Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Quantifying the Performability of Cluster-Based Services

IEEE Transactions on Parallel and Distributed Systems
SPEK: A Storage Performance Evaluation Kernel Module for Block-Level Storage Systems under Faulty Conditions

IEEE Transactions on Dependable and Secure Computing
IRON file systems

Proceedings of the twentieth ACM symposium on Operating systems principles
Designing for Disasters

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Automating data dependability

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
An approach to benchmarking configuration complexity

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Trustworthy software systems: a discussion of basic concepts and terminology

ACM SIGSOFT Software Engineering Notes
Emulation of Software Faults: A Field Data Study and a Practical Approach

IEEE Transactions on Software Engineering
The many faces of systems research: and how to evaluate them

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Using fault injection and modeling to evaluate the performability of cluster-based services

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Experiences in measuring the reliability of a cache-based storage system

WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
Toward recovery-oriented computing

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Evaluating fault-tolerant system designs using FAUmachine

Proceedings of the 2007 workshop on Engineering fault tolerant systems
Parity lost and parity regained

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Uncovering performance differences among backbone ISPs with Netdiff

NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Techniques for service level enforcement in web-services based systems

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Framework for exercising I/O exception handling code

International Journal of Information and Communication Technology
Negotiating and Enforcing QoS and SLAs in Grid and Cloud Computing

GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Service-level enforcement in web-services-based systems

International Journal of Web and Grid Services
R-cubed (R3): rate, robustness, and recovery - an availability benchmark framework

R-cubed (R3): rate, robustness, and recovery - an availability benchmark framework
Block-level RAID is dead

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
Designing for disasters

FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Towards reliable storage systems

Towards reliable storage systems
A methodology for the automated identification of buffer overflow vulnerabilities in executable software without source-code

LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Scalable Reed-Solomon-based reliable local storage for HPC applications on iaas clouds

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Benchmarks have historically played a key role in guiding the progress of computer science systems research and development, but have traditionally neglected the areas of availability, maintainability, and evolutionary growth, areas that have recently become critically important in high-end system design. As a first step in addressing this deficiency, we introduce a general methodology for benchmarking the availability of computer systems. Our methodology uses fault injection to provoke situations where availability may be compromised, leverages existing performance benchmarks for workload generation and data collection, and can produce results in both detail-rich graphical presentations or in distilled numerical summaries. We apply the methodology to measure the availability of the software RAID systems shipped with Linux, Solaris 7 Server, and Windows 2000 Server, and find that the methodology is powerful enough not only to quantify the impact of various failure conditions on the availability of these systems, but also to unearth their design philosophies with respect to transient errors and recovery policy.