An experimental evaluation of the assumption of independence in multiversion programming
IEEE Transactions on Software Engineering
Performance Modeling Based on Real Data: A Case Study
IEEE Transactions on Computers - Fault-Tolerant Computing
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Minimizing completion time of a program by checkpointing and rejuvenation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal software rejuvenation for tolerating soft failures
Performance Evaluation
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
GUARDS: A Generic Upgradable Architecture for Real-Time Dependable Systems
IEEE Transactions on Parallel and Distributed Systems
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Fine grained software degradation models for optimal rejuvenation policies
Performance Evaluation
Clustering Algorithms
Monitoring Smoothly Degrading Systems for Increased Dependability
Empirical Software Engineering
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule
PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Analyze-NOW-an environment for collection and analysis of failures in a network of workstations
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
A Methodology for Detection and Estimation of Software Aging
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Modeling and Analysis of Software Rejuvenation in Cable Modem Termination Systems
ISSRE '02 Proceedings of the 13th International Symposium on Software Reliability Engineering
An Approach for Estimation of Software Aging in a Web Server
ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Rejuvenation and Failure Detection in Partitionable Systems
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Software Rejuvenation Policies for Cluster Systems under Varying Workload
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Proactive management of software aging
IBM Journal of Research and Development
Processing forecasting queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ISAS '07 Proceedings of the 4th international symposium on Service Availability
The Role of Field Data for Analyzing the Dependability of Short Range Wireless Technologies
SEUS '08 Proceedings of the 6th IFIP WG 10.2 international workshop on Software Technologies for Embedded and Ubiquitous Systems
A defect prediction method for software versioning
Software Quality Control
In-field healing of integration problems with COTS components
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A new model for evaluating performability under the effects of software aging and rejuvenation
SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Managing performance of aging applications via synchronized replica rejuvenation
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Achieving and assuring high availability
ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of a software system with rejuvenation, restoration and checkpointing
ISAS'08 Proceedings of the 5th international conference on Service availability
On the potential of software rejuvenation for long-running sensor network deployments
Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Semi-Markov performance modelling of a redundant system with partial, full and failed rejuvenation
International Journal of Critical Computer-Based Systems
ACM Transactions on Computer Systems (TOCS)
Journal of Systems and Software
On-line adaptive algorithms in autonomic restart control
ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing
Software aging assessment through a specialization of the SQuaRE quality model
WOSQ'09 Proceedings of the Seventh ICSE conference on Software quality
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
Optimisation of virtual machine garbage collection policies
ASMTA'11 Proceedings of the 18th international conference on Analytical and stochastic modeling techniques and applications
Towards IT systems capable of managing their health
FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems
Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure
Proceedings of the Middleware 2011 Industry Track Workshop
What to do when things go wrong: recovery in complex (computer) systems
Proceedings of the 11th annual international conference on Aspect-oriented Software Development Companion
Analysis of a service degradation model with preventive rejuvenation
ISAS'06 Proceedings of the Third international conference on Service Availability
Software rejuvenation in the cloud
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Component testing is not enough: a study of software faults in telecom middleware
TestCom'07/FATES'07 Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems
Towards dependable clients: improving the reliability and availability of the browsers
Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
A comparative experimental study of software rejuvenation overhead
Performance Evaluation
Predicting aging-related bugs using software complexity metrics
Performance Evaluation
How does testing affect the availability of aging software systems?
Performance Evaluation
Proceedings of the 17th Conference on Pattern Languages of Programs
Synthetic Hardware Performance Analysis in Virtualized Cloud Environment for Healthcare Organization
Journal of Medical Systems
Exception handlers for healing component-based systems
ACM Transactions on Software Engineering and Methodology (TOSEM) - Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Job completion time on a virtualized server with software rejuvenation
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software aging in the eucalyptus cloud computing infrastructure: Characterization and rejuvenation
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation
Performance Evaluation
Hi-index | 0.00 |
Recently, the phenomenon of software aging, one in which the state of the software system degrades with time, has been reported. This phenomenon, which may eventually lead to system performance degradation and/or crash/hang failure, is the result of exhaustion of operating system resources, data corruption, and numerical error accumulation. To counteract software aging, a technique called software rejuvenation has been proposed, which essentially involves occasionally terminating an application or a system, cleaning its internal state and/or its environment, and restarting it. Since rejuvenation incurs an overhead, an important research issue is to determine optimal times to initiate this action. In this paper, we first describe how to include faults attributed to software aging in the framework of Gray's software fault classification (deterministic and transient), and study the treatment and recovery strategies for each of the fault classes. We then construct a semi-Markov reward model based on workload and resource usage data collected from the UNIX operating system. We identify different workload states using statistical cluster analysis, estimate transition probabilities, and sojourn time distributions from the data. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource depletion in each state. The model is then solved to obtain estimated times to exhaustion for each resource. The result from the semi-Markov reward model are then fed into a higher-level availability model that accounts for failure followed by reactive recovery, as well as proactive recovery. This comprehensive model is then used to derive optimal rejuvenation schedules that maximize availability or minimize downtime cost.