Software Rejuvenation: Analysis, Module and Applications

Authors:
Nick Kolettis;N. Dudley Fulton
Affiliations:
-;-
Venue:
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Year:
1995

Citing 7
Cited 137

Software reliability: measurement, prediction, application

Software reliability: measurement, prediction, application
Reliable computer systems (2nd ed.): design and evaluation

Reliable computer systems (2nd ed.): design and evaluation
Two techniques for transient software error recovery

Papers of the workshop on Hardware and software architectures for fault tolerance : experiences and perspectives: experiences and perspectives
Software aging

ICSE '94 Proceedings of the 16th international conference on Software engineering
Transaction Processing: Concepts and Techniques

Transaction Processing: Concepts and Techniques
Checkpointing and Its Applications

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Optimizing preventive service of software products

IBM Journal of Research and Development

Why do some (weird) people inject faults?

ACM SIGSOFT Software Engineering Notes
Analysis of Preventive Maintenance in Transactions Based Software Systems

IEEE Transactions on Computers
Analysis and implementation of software rejuvenation in cluster systems

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Modeling software design diversity: a review

ACM Computing Surveys (CSUR)
BASE: using abstraction to improve fault tolerance

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ROC-1: Hardware Support for Recovery-Oriented Computing

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Improving cluster availability using workstation validation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Portable serialization of CORBA objects: a reflective approach

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A comparative analysis of hardware and software fault tolerance: Impact on software reliability engineering

Annals of Software Engineering
Monitoring Smoothly Degrading Systems for Increased Dependability

Empirical Software Engineering
Availability analysis and improvement of active/standby cluster systems using software rejuvenation

Journal of Systems and Software
Achieving Fault-Tolerant Software with Rejuvenation and Reconfiguration

IEEE Software
Perfect Failure Detection in Timed Asynchronous Systems

IEEE Transactions on Computers
Reset-Driven Fault Tolerance

EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
Software Reliability and Rejuvenation: Modeling and Analysis

Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
The SIMBA User Alert Service Architecture for Dependable Alert Delivery

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
On-Board Maintenance for Long-Life Systems

ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period

WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
BASE: Using abstraction to improve fault tolerance

ACM Transactions on Computer Systems (TOCS)
Determinants of software volatility: a field study

Journal of Software Maintenance: Research and Practice
Improving availability with recursive microreboots: a soft-state system case study

Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Supervisory Control of Software Systems

IEEE Transactions on Computers
Basic Concepts and Taxonomy of Dependable and Secure Computing

IEEE Transactions on Dependable and Secure Computing
Security analysis of SITAR intrusion tolerance system

Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems: in association with 10th ACM Conference on Computer and Communications Security
Cheap recovery: a key to self-managing state

ACM Transactions on Storage (TOS)
Effective Fault Treatment for Improving the Dependability of COTS and Legacy-Based Applications

IEEE Transactions on Dependable and Secure Computing
Proactive Fault Handling for System Availability Enhancement

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Destructive Transaction: Human-Oriented Cluster System Management Mechanism

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
A Comprehensive Model for Software Rejuvenation

IEEE Transactions on Dependable and Secure Computing
Ensuring stable performance for systems that degrade

Proceedings of the 5th international workshop on Software and performance
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors

Proceedings of the 33rd annual international symposium on Computer Architecture
Performability analysis of clustered systems with rejuvenation under varying workload

Performance Evaluation
Modeling and analysis of software aging and software failure

Journal of Systems and Software
Ensuring system performance for cluster and single server systems

Journal of Systems and Software
On modeling and tolerating incorrect software

Journal of High Speed Networks - Self-Stabilizing Systems, Part 2
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Goal-Directed Reasoning for Specification-Based Data Structure Repair

IEEE Transactions on Software Engineering
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Treating bugs as allergies: a safe method for surviving software failures

HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Constructing services with interposable virtual hardware

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?

USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Rx: Treating bugs as allergies—a safe method to survive software failures

ACM Transactions on Computer Systems (TOCS)
A survey of linguistic structures for application-level fault tolerance

ACM Computing Surveys (CSUR)
Implementing an autonomic architecture for fault-tolerance in a wireless sensor network testbed for at-scale experimentation

Proceedings of the 2008 ACM symposium on Applied computing
Enhancing storage system availability on multi-core architectures with recovery-conscious scheduling

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Case-based software reliability assessmentby fault injection unified procedures

Proceedings of the 2008 international workshop on Software Engineering in east and south europe
Achieving availability and survivability in wireless sensor networks by software rejuvenation

Proceedings of the 4th international workshop on Security, privacy and trust in pervasive and ubiquitous computing
LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
The FOREVER service for fault/intrusion removal

Proceedings of the 2nd workshop on Recent advances on intrusiton-tolerant systems
A Faster Estimation Algorithm for Periodic Preventive Rejuvenation Schedule Maximizing System Availability

ISAS '07 Proceedings of the 4th international symposium on Service Availability
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation

ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Mining Software Aging Patterns by Artificial Neural Networks

ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
Model-Driven Adaptive Self-healing for Autonomic Computing

MACE '08 Proceedings of the 3rd IEEE international workshop on Modelling Autonomic Communications Environments
ASSURE: automatic software self-healing using rescue points

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Failure detectors for wireless sensor-actuator systems

Ad Hoc Networks
First-aid: surviving and preventing memory management bugs during production runs

Proceedings of the 4th ACM European conference on Computer systems
Estimating Periodic Software Rejuvenation Schedules under Discrete-Time Operation Circumstance

IEICE - Transactions on Information and Systems
A systematic approach to system state restoration during storage controller micro-recovery

FAST '09 Proccedings of the 7th conference on File and storage technologies
Evaluating recovery aware components for grid reliability

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Availability analysis of application servers using software rejuvenation and virtualization

Journal of Computer Science and Technology
Software rejuvenation in embedded systems

Journal of Automata, Languages and Combinatorics
A new model for evaluating performability under the effects of software aging and rejuvenation

SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Proactive management of software aging

IBM Journal of Research and Development
Discrete-time cost analysis for a telecommunication billing application with rejuvenation

Computers & Mathematics with Applications
Application Server Aging Prediction Model Based on Wavelet Network with Adaptive Particle Swarm Optimization Algorithm

ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Monitoring for security intrusion using performance signatures

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Current research and practice in proactive fault management

International Journal of Computers and Applications
Self-configuring algorithm for software fault tolerance in (n,k)-way cluster systems

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Managing performance of aging applications via synchronized replica rejuvenation

DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Achieving and assuring high availability

ISAS'08 Proceedings of the 5th international conference on Service availability
User-perceived software service availability modeling with reliability growth

ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of a software system with rejuvenation, restoration and checkpointing

ISAS'08 Proceedings of the 5th international conference on Service availability
References

Dependability metrics
On the potential of software rejuvenation for long-running sensor network deployments

Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Semi-Markov performance modelling of a redundant system with partial, full and failed rejuvenation

International Journal of Critical Computer-Based Systems
Proactive obfuscation

ACM Transactions on Computer Systems (TOCS)
Memory leak analysis of mission-critical middleware

Journal of Systems and Software
Methods and opportunities for rejuvenation in aging distributed software systems

Journal of Systems and Software
Analysis of service availability for time-triggered rejuvenation policies

Journal of Systems and Software
Comprehensive evaluation of aperiodic checkpointing and rejuvenation schemes in operational software system

Journal of Systems and Software
Recovery scopes, recovery groups, and fine-grained recovery in enterprise storage controllers with multi-core processors

IBM Journal of Research and Development
A proactive fault-detection mechanism in large-scale cluster systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On-line adaptive algorithms in autonomic restart control

ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing
Fast and correct performance recovery of operating systems using a virtual machine monitor

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Considering non-functional aspects in the design of hypermedia authoring tools

Proceedings of the 2011 ACM Symposium on Applied Computing
Architecting dependable systems with proactive fault management

Architecting dependable systems VII
PEASOUP: preventing exploits against software of uncertain provenance (position paper)

Proceedings of the 7th International Workshop on Software Engineering for Secure Systems
A Petri net model for service availability in redundant computing systems

Winter Simulation Conference
Towards IT systems capable of managing their health

FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems
Architecture-based run-time fault diagnosis

ECSA'11 Proceedings of the 5th European conference on Software architecture
Detecting and surviving data races using complementary schedules

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Checkpointing strategies for parallel jobs

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Controlling software architecture erosion: A survey

Journal of Systems and Software
REASSURE: a self-contained mechanism for healing software using rescue points

IWSEC'11 Proceedings of the 6th International conference on Advances in information and computer security
Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure

Proceedings of the Middleware 2011 Industry Track Workshop
A survivability model for cluster system

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Monitoring the health condition of a ubiquitous system: rejuvenation vs. recovery

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A dependability management mechanism for ubiquitous computing systems

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Modeling and cost analysis of nested software rejuvenation policy

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Prediction-Based software availability enhancement

Self-star Properties in Complex Information Systems
Analysis of a service degradation model with preventive rejuvenation

ISAS'06 Proceedings of the Third international conference on Service Availability
Study on application server aging prediction based on wavelet network with hybrid genetic algorithm

ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
A self-healing component sandbox for untrustworthy third party code execution

CBSE'10 Proceedings of the 13th international conference on Component-Based Software Engineering
A model of ITS using cold standby cluster

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Tolerance of design faults

Dependable and Historic Computing
Admission control policies for a multi-class QoS-aware service oriented architecture

ACM SIGMETRICS Performance Evaluation Review
Fault Resilient Real-Time Design for NoC Architectures

ICCPS '12 Proceedings of the 2012 IEEE/ACM Third International Conference on Cyber-Physical Systems
A proactive approach towards always-on availability in broadband cable networks

Computer Communications
To increase survivability with software rejuvenation by having dual base station in WSN environment

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A survivability model in wireless sensor networks

Computers & Mathematics with Applications
MemRed: towards reliable web applications

Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
Towards dependable clients: improving the reliability and availability of the browsers

Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
Self-healing multitier architectures using cascading rescue points

Proceedings of the 28th Annual Computer Security Applications Conference
A comparative experimental study of software rejuvenation overhead

Performance Evaluation
Editorial: Special Issue on Software Aging and Rejuvenation - Guest Editorial

Performance Evaluation
Predicting aging-related bugs using software complexity metrics

Performance Evaluation
How does testing affect the availability of aging software systems?

Performance Evaluation
Dynamic software rejuvenation policies in a transaction-based system under Markovian arrival processes

Performance Evaluation
Modeling and analysis of software rejuvenation in a server virtualized system with live VM migration

Performance Evaluation
Architecture-based self-protecting software systems

Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Automatic recovery from runtime failures

Proceedings of the 2013 International Conference on Software Engineering
A framework for self-healing software systems

Proceedings of the 2013 International Conference on Software Engineering
Diagnosing architectural run-time failures

Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Software rejuvenation

Proceedings of the 17th Conference on Pattern Languages of Programs
On the efficiency of durable state machine replication

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
A survey of software aging and rejuvenation studies

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Job completion time on a virtualized server with software rejuvenation

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software aging in the eucalyptus cloud computing infrastructure: Characterization and rejuvenation

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation

Performance Evaluation
A Systematic Survey of Self-Protecting Software Systems

ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section on Best Papers from SEAMS 2012
Software health management with Bayesian networks

Innovations in Systems and Software Engineering
Checkpointing algorithms and fault prediction

Journal of Parallel and Distributed Computing
Workload-aware anomaly detection for Web applications

Journal of Systems and Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

Software rejuvenation is the concept of gracefully terminating an application and immediately restarting it at a clean internal state. In a client-server type of application where the server is intended to run perpetually for providing a service to its clients, rejuvenating the server process periodically during the most idle time of the server increases the availability of that service.In a long-running computation-intensive application, rejuvenating the application periodically and restarting it at a previous checkpoint increases the likelihood of successfully completing the application execution. We present a model for analyzing software rejuvenation in such continuously-running applications and express downtime and costs due to downtime during rejuvneation in terms of the parameters in that model. Threshold conditions for rejuvenation to be beneficial are also derived. We implemented a reusable module to perform software rejuvenation. That module can be embedded in any existing application on a UNIX platform with minimal effort. Experiences with software rejuvenation in a billing data collection subsystem of a telecommunications operations system and other continuously-running systems and scientific applications in AT&T are described.