IEEE Transactions on Dependable and Secure Computing
IEEE Transactions on Parallel and Distributed Systems
Resilience is more than availability
Proceedings of the 2011 workshop on New security paradigms workshop
Hi-index | 0.00 |
Abstract Reliability and availability have long been considered twin system properties that could be enhanced by distribution. Paradoxically, the traditional definitions of these properties do not recognize the positive impact of recovery---as distinct from simple repair and restart---on reliability, nor the negative effect of recovery, and of internetworking of clients and servers, on availability. As a result of employing the standard definitions, reliability would tend to be underestimated, and availability overestimated. We offer revised definitions of these two critical metrics, which we call service reliability and service availability, that improve the match between their formal expression, and intuitive meaning. A fortuitous advantage of our approach is that the product of our two metrics yields a highly meaningful figure of merit for the overall dependability of a system. But techniques that enhance system dependability exact a performance cost, so we conclude with a cohesive definition of performability that rewards the system for performance that is delivered to its client applications, after discounting the following consequences of failure: service denial and interruption, lost work, and recovery cost.