Model based approach for autonomic availability management

Authors:
Kesari Mishra;Kishor S. Trivedi
Affiliations:
Dept. of Electrical and Computer Engineering, Duke University, Durham, NC;Dept. of Electrical and Computer Engineering, Duke University, Durham, NC
Venue:
ISAS'06 Proceedings of the Third international conference on Service Availability
Year:
2006

Citing 12
Cited 0

Reliability: probabilistic models and statistical methods

Reliability: probabilistic models and statistical methods
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package

Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Fault-tolerance in air traffic control systems

ACM Transactions on Computer Systems (TOCS)
Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Dependability Measurement and Modeling of a Multicomputer System

IEEE Transactions on Computers
The Vision of Autonomic Computing

Computer
A Flexible Clustered Approach to High Availability

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Performance and Reliability Evaluation of Passive Replication Schemes in Application Level Fault Tolerance

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule

PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
A longitudinal survey of Internet host reliability

SRDS '95 Proceedings of the 14TH Symposium on Reliable Distributed Systems
An Approach for Estimation of Software Aging in a Web Server

ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Analysis of Periodic Preventive Maintenance with General System Failure Distribution

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As increasingly complex computer systems have started playing a controlling role in all aspects of modern life, system availability and associated downtime of technical systems have acquired critical importance. Losses due to system downtime have risen manifold and become wide-ranging. Even though the component level availability of hardware and software has increased considerably, system wide availability still needs improvement as the heterogeneity of components and the complexity of interconnections has gone up considerably too. As systems become more interconnected and diverse, architects are less able to anticipate and design for every interaction among components, leaving such issues to be dealt with at runtime. Therefore, in this paper, we propose an approach for autonomic management of system availability, which provides real-time evaluation, monitoring and management of the availability of systems in critical applications. A hybrid approach is used where analytic models provide the behavioral abstraction of components/subsystems, their interconnections and dependencies and statistical inference is applied on the data from real time monitoring of those components and subsystems, to parameterize the system availability model. The model is solved online (that is, in real time) so that at any instant of time, both the point as well as the interval estimates of the overall system availability are obtained by propagating the point and the interval estimates of each of the input parameters, through the system model. The online monitoring and estimation of system availability can then lead to adaptive online control of system availability.