Performability Analysis: Measures, an Algorithm, and a Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Analysis and Modeling of Correlated Failures in Multicomputer Systems
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
Locality-aware request distribution in cluster-based network servers
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Proceedings of the seventeenth ACM symposium on Operating systems principles
Efficiency vs. portability in cluster-based network servers
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Lessons from Giant-Scale Services
IEEE Internet Computing
An approach towards benchmarking of fault-tolerant commercial systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Harvest, Yield, and Scalable Tolerant Systems
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Networked Windows NT System Field Failure Data Analysis
PRDC '99 Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing
Comparing Operating Systems Using Robustness Benchmarks
SRDS '97 Proceedings of the 16th Symposium on Reliable Distributed Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Failure Data Analysis of a LAN of Windows NT Based Computers
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
An Approach for Estimation of Software Aging in a Web Server
ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
User-Level Communication in Cluster-Based Servers
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Performability evaluation: where it is and what lies ahead
IPDS '95 Proceedings of the International Computer Performance and Dependability Symposium on Computer Performance and Dependability Symposium
Reducing the Cost of System Administration of a Disk Storage System
Reducing the Cost of System Administration of a Disk Storage System
Scalable, distributed data structures for internet service construction
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Towards availability benchmarks: a case study of software raid systems
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Scalable content-aware request distribution in cluster-based networks servers
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Supporting Cluster-Based Network Services on Functionally Symmetric Software Architecture
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
An Efficient Topology-Adaptive Membership Protocol for Large-Scale Cluster-Based Services
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Robustness Testing of Java Server Applications
IEEE Transactions on Software Engineering
IEEE Transactions on Dependable and Secure Computing
Navigating error recovery code in Java applications
eclipse '05 Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
International Journal of High Performance Computing Applications
Dependency-aware maintenance for highly available service-oriented grid
Journal of Systems and Software
Fast black-box testing of system recovery code
Proceedings of the 7th ACM european conference on Computer Systems
DRO+: a systemic and economical approach to improve availability of massive database systems
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Performability analysis of storage systems in practice: methodology and tools
ISAS'06 Proceedings of the Third international conference on Service Availability
Hi-index | 0.00 |
We propose a two-phase methodology for quantifying the performability (performance and availability) of cluster-based Internet services. In the first phase, evaluators use a fault-injection infrastructure to measure the impact of faults on the server's performance. In the second phase, evaluators use an analytical model to combine an expected fault load with measurements from the first phase to assess the server's performability. Using this model, evaluators can study the server's sensitivity to different design decisions, fault rates, and environmental factors. To demonstrate our methodology, we study the performability of 4 versions of the PRESS Web server against 5 classes of faults, quantifying the effects of different design decisions on performance and availability. Finally, to further show the utility of our model, we also quantify the impact of two hypothetical changes, reduced human operator response time and the use of RAIDs.