High-Availability Computer Systems
Computer
Two techniques for transient software error recovery
Papers of the workshop on Hardware and software architectures for fault tolerance : experiences and perspectives: experiences and perspectives
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Fault-tolerant computer system design
Fault-tolerant computer system design
Minimizing completion time of a program by checkpointing and rejuvenation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimal software rejuvenation for tolerating soft failures
Performance Evaluation
File system aging—increasing the relevance of file system benchmarks
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Fine grained software degradation models for optimal rejuvenation policies
Performance Evaluation
Clustering Algorithms
Dependability: Basic Concepts and Terminology
Dependability: Basic Concepts and Terminology
Monitoring Smoothly Degrading Systems for Increased Dependability
Empirical Software Engineering
Software Dependability in the Tandem GUARDIAN System
IEEE Transactions on Software Engineering
SPNP: Stochastic Petri Nets. Version 6.0
TOOLS '00 Proceedings of the 11th International Conference on Computer Performance Evaluation: Modelling Techniques and Tools
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Statistical non-parametric algorithms to estimate the optimal software rejuvenation schedule
PRDC '00 Proceedings of the 2000 Pacific Rim International Symposium on Dependable Computing
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
A Methodology for Detection and Estimation of Software Aging
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
An Approach for Estimation of Software Aging in a Web Server
ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Time Series Analysis and Its Applications (Springer Texts in Statistics)
Time Series Analysis and Its Applications (Springer Texts in Statistics)
Proactive management of software aging
IBM Journal of Research and Development
Optimizing preventive service of software products
IBM Journal of Research and Development
Pitfalls in parallel job scheduling evaluation
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Hi-index | 0.00 |
Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity of software and the well-developed techniques and analysis for hardware reliability, this trend is not likely to change in the near future. In this paper, we classify software faults and discuss various techniques to deal with them in the testing/debugging phase and the operational phase of the software.We discuss the phenomenon of software aging and a preventive maintenance technique to deal with this problem called software rejuvenation. Stochastic models to evaluate the effectiveness of preventive maintenance in operational software systems and to determine optimal times to perform rejuvenation for different scenarios are described. We also present measurement-based methodologies to detect software aging and estimate its effect on various system resources. These models are intended to help develop software rejuvenation policies. An automated online measurement-based approach has been used in the software rejuvenation agent implemented in a major commercial server.