FINE: A Fault Injection and Monitoring Environment for Tracing the UNIX System Behavior Under Faults
IEEE Transactions on Software Engineering - Special issue on software reliability
RAID: high-performance, reliable secondary storage
ACM Computing Surveys (CSUR)
Cluster-based scalable network services
Proceedings of the sixteenth ACM symposium on Operating systems principles
The Future of Systems Research
Computer
An approach towards benchmarking of fault-tolerant commercial systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
How Fail-Stop are Faulty Programs?
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
The Systematic Improvement of Fault Tolerance in the Rio File Cache
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
ISTORE: Introspective Storage for Data-Intensive Network Services
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Comparing Operating Systems Using Robustness Benchmarks
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
Dependability Analysis of a Cache-Based RAID System via Fast Distributed Simulation
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
Fault Injection and Dependability Evaluation of Fault-Tolerant Systems
Performance availability for networks of workstations
Performance availability for networks of workstations
Characterizing large storage systems: error behavior and performance benchmarks
Characterizing large storage systems: error behavior and performance benchmarks
ROC-1: Hardware Support for Recovery-Oriented Computing
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Improving cluster availability using workstation validation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Quantifying Network Denial of Service: A Location Service Case Study
ICICS '01 Proceedings of the Third International Conference on Information and Communications Security
From Experimental Assessment of Fault-Tolerant Systems to Dependability Benchmarking
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A characterization of the sensitivity of query optimization to storage access cost parameters
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Joint evaluation of recovery and performance of a COTS DBMS in the presence of operator faults
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Dependable and Secure Computing
Proceedings of the twentieth ACM symposium on Operating systems principles
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
An approach to benchmarking configuration complexity
Proceedings of the 11th workshop on ACM SIGOPS European workshop
Trustworthy software systems: a discussion of basic concepts and terminology
ACM SIGSOFT Software Engineering Notes
Emulation of Software Faults: A Field Data Study and a Practical Approach
IEEE Transactions on Software Engineering
The many faces of systems research: and how to evaluate them
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Experiences in measuring the reliability of a cache-based storage system
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
Toward recovery-oriented computing
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Evaluating fault-tolerant system designs using FAUmachine
Proceedings of the 2007 workshop on Engineering fault tolerant systems
Parity lost and parity regained
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Uncovering performance differences among backbone ISPs with Netdiff
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Techniques for service level enforcement in web-services based systems
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Framework for exercising I/O exception handling code
International Journal of Information and Communication Technology
Negotiating and Enforcing QoS and SLAs in Grid and Cloud Computing
GPC '09 Proceedings of the 4th International Conference on Advances in Grid and Pervasive Computing
Service-level enforcement in web-services-based systems
International Journal of Web and Grid Services
R-cubed (R3): rate, robustness, and recovery - an availability benchmark framework
R-cubed (R3): rate, robustness, and recovery - an availability benchmark framework
HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
Towards reliable storage systems
Towards reliable storage systems
LADC'05 Proceedings of the Second Latin-American conference on Dependable Computing
Scalable Reed-Solomon-based reliable local storage for HPC applications on iaas clouds
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
Benchmarks have historically played a key role in guiding the progress of computer science systems research and development, but have traditionally neglected the areas of availability, maintainability, and evolutionary growth, areas that have recently become critically important in high-end system design. As a first step in addressing this deficiency, we introduce a general methodology for benchmarking the availability of computer systems. Our methodology uses fault injection to provoke situations where availability may be compromised, leverages existing performance benchmarks for workload generation and data collection, and can produce results in both detail-rich graphical presentations or in distilled numerical summaries. We apply the methodology to measure the availability of the software RAID systems shipped with Linux, Solaris 7 Server, and Windows 2000 Server, and find that the methodology is powerful enough not only to quantify the impact of various failure conditions on the availability of these systems, but also to unearth their design philosophies with respect to transient errors and recovery policy.