A Comprehensive Model for Software Rejuvenation

Authors:
Kalyanaraman Vaidyanathan;Kishor S. Trivedi
Affiliations:
IEEE;IEEE
Venue:
IEEE Transactions on Dependable and Secure Computing
Year:
2005

Citing 29
Cited 36

An experimental evaluation of the assumption of independence in multiversion programming

IEEE Transactions on Software Engineering
Performance Modeling Based on Real Data: A Case Study

IEEE Transactions on Computers - Fault-Tolerant Computing
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package

Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Minimizing completion time of a program by checkpointing and rejuvenation

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal software rejuvenation for tolerating soft failures

Performance Evaluation
Analysis of Preventive Maintenance in Transactions Based Software Systems

IEEE Transactions on Computers
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems

IEEE Transactions on Parallel and Distributed Systems
Analysis and implementation of software rejuvenation in cluster systems

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Fine grained software degradation models for optimal rejuvenation policies

Performance Evaluation
Clustering Algorithms

Clustering Algorithms
Monitoring Smoothly Degrading Systems for Increased Dependability

Empirical Software Engineering
Distributed Fault Tolerance: Lessons from Delta-4

IEEE Micro
Software Dependability in the Tandem GUARDIAN System

IEEE Transactions on Software Engineering
Advanced Pattern Recognition for Detection of Complex Software Aging Phenomena in Online Transaction Processing Servers

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Performance and Reliability Evaluation of Passive Replication Schemes in Application Level Fault Tolerance

FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule

PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period

WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Analyze-NOW-an environment for collection and analysis of failures in a network of workstations

ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
A Methodology for Detection and Estimation of Software Aging

ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Modeling and Analysis of Software Rejuvenation in Cable Modem Termination Systems

ISSRE '02 Proceedings of the 13th International Symposium on Software Reliability Engineering
An Approach for Estimation of Software Aging in a Web Server

ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Software Rejuvenation: Analysis, Module and Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Checkpointing and Its Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Rejuvenation and Failure Detection in Partitionable Systems

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Software Rejuvenation Policies for Cluster Systems under Varying Workload

PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Proactive management of software aging

IBM Journal of Research and Development

Processing forecasting queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A Faster Estimation Algorithm for Periodic Preventive Rejuvenation Schedule Maximizing System Availability

ISAS '07 Proceedings of the 4th international symposium on Service Availability
The Role of Field Data for Analyzing the Dependability of Short Range Wireless Technologies

SEUS '08 Proceedings of the 6th IFIP WG 10.2 international workshop on Software Technologies for Embedded and Ubiquitous Systems
A defect prediction method for software versioning

Software Quality Control
In-field healing of integration problems with COTS components

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A new model for evaluating performability under the effects of software aging and rejuvenation

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Managing performance of aging applications via synchronized replica rejuvenation

DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Achieving and assuring high availability

ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of a software system with rejuvenation, restoration and checkpointing

ISAS'08 Proceedings of the 5th international conference on Service availability
On the potential of software rejuvenation for long-running sensor network deployments

Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Semi-Markov performance modelling of a redundant system with partial, full and failed rejuvenation

International Journal of Critical Computer-Based Systems
Proactive obfuscation

ACM Transactions on Computer Systems (TOCS)
Comprehensive evaluation of aperiodic checkpointing and rejuvenation schemes in operational software system

Journal of Systems and Software
On-line adaptive algorithms in autonomic restart control

ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing
Software aging assessment through a specialization of the SQuaRE quality model

WOSQ'09 Proceedings of the Seventh ICSE conference on Software quality
Architecting dependable systems with proactive fault management

Architecting dependable systems VII
Optimisation of virtual machine garbage collection policies

ASMTA'11 Proceedings of the 18th international conference on Analytical and stochastic modeling techniques and applications
Towards IT systems capable of managing their health

FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems
Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure

Proceedings of the Middleware 2011 Industry Track Workshop
What to do when things go wrong: recovery in complex (computer) systems

Proceedings of the 11th annual international conference on Aspect-oriented Software Development Companion
Analysis of a service degradation model with preventive rejuvenation

ISAS'06 Proceedings of the Third international conference on Service Availability
Software rejuvenation in the cloud

Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Component testing is not enough: a study of software faults in telecom middleware

TestCom'07/FATES'07 Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems
Towards dependable clients: improving the reliability and availability of the browsers

Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
A comparative experimental study of software rejuvenation overhead

Performance Evaluation
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
How does testing affect the availability of aging software systems?

Performance Evaluation
Modeling and analysis of software rejuvenation in a server virtualized system with live VM migration

Performance Evaluation
Software rejuvenation

Proceedings of the 17th Conference on Pattern Languages of Programs
Synthetic Hardware Performance Analysis in Virtualized Cloud Environment for Healthcare Organization

Journal of Medical Systems
Exception handlers for healing component-based systems

ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
A survey of software aging and rejuvenation studies

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Job completion time on a virtualized server with software rejuvenation

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software aging in the eucalyptus cloud computing infrastructure: Characterization and rejuvenation

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation

Performance Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaustion of operating system resources, data corruption, and numerical error accumulation. To counteract software aging, a technique called software rejuvenation has been proposed, which essentially involves occasionally terminating an application or a system, cleaning its internal state and/or its environment, and restarting it. Since rejuvenation incurs an overhead, an important research issue is to determine optimal times to initiate this action. In this paper, we first describe how to include faults attributed to software aging in the framework of Gray's software fault classification (deterministic and transient), and study the treatment and recovery strategies for each of the fault classes. We then construct a semi-Markov reward model based on workload and resource usage data collected from the UNIX operating system. We identify different workload states using statistical cluster analysis, estimate transition probabilities, and sojourn time distributions from the data. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource depletion in each state. The model is then solved to obtain estimated times to exhaustion for each resource. The result from the semi-Markov reward model are then fed into a higher-level availability model that accounts for failure followed by reactive recovery, as well as proactive recovery. This comprehensive model is then used to derive optimal rejuvenation schedules that maximize availability or minimize downtime cost.