High-Availability Computer Systems
Computer
Minimizing completion time of a program by checkpointing and rejuvenation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal software rejuvenation for tolerating soft failures
Performance Evaluation
Fault Tolerant Architectures - Past, Present, and (?) Future
Revised Papers from a Workshop on Hardware and Software Architectures for Fault Tolerance
Two Techniques for Transient Software Error Recovery
Revised Papers from a Workshop on Hardware and Software Architectures for Fault Tolerance
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Availability analysis and improvement of active/standby cluster systems using software rejuvenation
Journal of Systems and Software
Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
On-Board Maintenance for Long-Life Systems
ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Performability analysis of clustered systems with rejuvenation under varying workload
Performance Evaluation
Modeling and analysis of software aging and software failure
Journal of Systems and Software
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
A step towards software preventive maintenance
ACM SIGSOFT Software Engineering Notes
ISAS '07 Proceedings of the 4th international symposium on Service Availability
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Availability analysis of application servers using software rejuvenation and virtualization
Journal of Computer Science and Technology
Proactive management of software aging
IBM Journal of Research and Development
Managing performance of aging applications via synchronized replica rejuvenation
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
User-perceived software service availability modeling with reliability growth
ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of a software system with rejuvenation, restoration and checkpointing
ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of service availability for time-triggered rejuvenation policies
Journal of Systems and Software
Journal of Systems and Software
On-line adaptive algorithms in autonomic restart control
ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing
Fast and correct performance recovery of operating systems using a virtual machine monitor
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
A Petri net model for service availability in redundant computing systems
Winter Simulation Conference
A survivability model for cluster system
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
A survivability model for cluster system under dos attacks
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Modeling and cost analysis of nested software rejuvenation policy
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
Analysis of a service degradation model with preventive rejuvenation
ISAS'06 Proceedings of the Third international conference on Service Availability
Can dynamic provisioning and rejuvenation systems coexist in peace?
DSOM'05 Proceedings of the 16th IFIP/IEEE Ambient Networks international conference on Distributed Systems: operations and Management
A quantitative measure for preventive maintenance in software
ACM SIGSOFT Software Engineering Notes
Software rejuvenation in the cloud
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
A proactive approach towards always-on availability in broadband cable networks
Computer Communications
Investigating dynamic reliability and availability through state-space models
Computers & Mathematics with Applications
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation
Performance Evaluation
Hi-index | 14.99 |
Preventive maintenance of operational software systems, a novel technique for software fault tolerance, is used specifically to counteract the phenomenon of software “aging”. However, it incurs some overhead. The necessity to do preventive maintenance, not only in general purpose software systems of mass use, but also in safety-critical and highly available systems, clearly indicates the need to follow an analysis based approach to determine the optimal times to perform preventive maintenance. In this paper, we present an analytical model of a software system which serves transactions. Due to aging, not only the service rate of the software decreases with time, but also the software itself experiences crash/hang failures which result in its unavailability. Two policies for preventive maintenance are modeled and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived. Numerical examples are presented to illustrate the applicability of the models