Software reliability: measurement, prediction, application
Software reliability: measurement, prediction, application
Reliable computer systems (2nd ed.): design and evaluation
Reliable computer systems (2nd ed.): design and evaluation
Two techniques for transient software error recovery
Papers of the workshop on Hardware and software architectures for fault tolerance : experiences and perspectives: experiences and perspectives
ICSE '94 Proceedings of the 16th international conference on Software engineering
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Optimizing preventive service of software products
IBM Journal of Research and Development
Why do some (weird) people inject faults?
ACM SIGSOFT Software Engineering Notes
Analysis of Preventive Maintenance in Transactions Based Software Systems
IEEE Transactions on Computers
Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Modeling software design diversity: a review
ACM Computing Surveys (CSUR)
BASE: using abstraction to improve fault tolerance
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ROC-1: Hardware Support for Recovery-Oriented Computing
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Improving cluster availability using workstation validation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Portable serialization of CORBA objects: a reflective approach
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Annals of Software Engineering
Monitoring Smoothly Degrading Systems for Increased Dependability
Empirical Software Engineering
Availability analysis and improvement of active/standby cluster systems using software rejuvenation
Journal of Systems and Software
Perfect Failure Detection in Timed Asynchronous Systems
IEEE Transactions on Computers
EDCC-4 Proceedings of the 4th European Dependable Computing Conference on Dependable Computing
Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
The SIMBA User Alert Service Architecture for Dependable Alert Delivery
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
On-Board Maintenance for Long-Life Systems
ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
On-Board Preventive Maintenance: Analysis of Effectiveness and Optimal Duty Period
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
BASE: Using abstraction to improve fault tolerance
ACM Transactions on Computer Systems (TOCS)
Determinants of software volatility: a field study
Journal of Software Maintenance: Research and Practice
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Supervisory Control of Software Systems
IEEE Transactions on Computers
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Security analysis of SITAR intrusion tolerance system
Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems: in association with 10th ACM Conference on Computer and Communications Security
Cheap recovery: a key to self-managing state
ACM Transactions on Storage (TOS)
Effective Fault Treatment for Improving the Dependability of COTS and Legacy-Based Applications
IEEE Transactions on Dependable and Secure Computing
Proactive Fault Handling for System Availability Enhancement
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Destructive Transaction: Human-Oriented Cluster System Management Mechanism
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Ensuring stable performance for systems that degrade
Proceedings of the 5th international workshop on Software and performance
Rx: treating bugs as allergies---a safe method to survive software failures
Proceedings of the twentieth ACM symposium on Operating systems principles
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors
Proceedings of the 33rd annual international symposium on Computer Architecture
Performability analysis of clustered systems with rejuvenation under varying workload
Performance Evaluation
Modeling and analysis of software aging and software failure
Journal of Systems and Software
Ensuring system performance for cluster and single server systems
Journal of Systems and Software
On modeling and tolerating incorrect software
Journal of High Speed Networks - Self-Stabilizing Systems, Part 2
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Goal-Directed Reasoning for Specification-Based Data Structure Repair
IEEE Transactions on Software Engineering
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Treating bugs as allergies: a safe method for surviving software failures
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Constructing services with interposable virtual hardware
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
A survey of linguistic structures for application-level fault tolerance
ACM Computing Surveys (CSUR)
Proceedings of the 2008 ACM symposium on Applied computing
Enhancing storage system availability on multi-core architectures with recovery-conscious scheduling
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Case-based software reliability assessmentby fault injection unified procedures
Proceedings of the 2008 international workshop on Software Engineering in east and south europe
Achieving availability and survivability in wireless sensor networks by software rejuvenation
Proceedings of the 4th international workshop on Security, privacy and trust in pervasive and ubiquitous computing
LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
The FOREVER service for fault/intrusion removal
Proceedings of the 2nd workshop on Recent advances on intrusiton-tolerant systems
ISAS '07 Proceedings of the 4th international symposium on Service Availability
Simulation-Based Optimization Approach for Software Cost Model with Rejuvenation
ATC '08 Proceedings of the 5th international conference on Autonomic and Trusted Computing
Mining Software Aging Patterns by Artificial Neural Networks
ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
Model-Driven Adaptive Self-healing for Autonomic Computing
MACE '08 Proceedings of the 3rd IEEE international workshop on Modelling Autonomic Communications Environments
ASSURE: automatic software self-healing using rescue points
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Failure detectors for wireless sensor-actuator systems
Ad Hoc Networks
First-aid: surviving and preventing memory management bugs during production runs
Proceedings of the 4th ACM European conference on Computer systems
Estimating Periodic Software Rejuvenation Schedules under Discrete-Time Operation Circumstance
IEICE - Transactions on Information and Systems
A systematic approach to system state restoration during storage controller micro-recovery
FAST '09 Proccedings of the 7th conference on File and storage technologies
Evaluating recovery aware components for grid reliability
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Availability analysis of application servers using software rejuvenation and virtualization
Journal of Computer Science and Technology
Software rejuvenation in embedded systems
Journal of Automata, Languages and Combinatorics
A new model for evaluating performability under the effects of software aging and rejuvenation
SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Proactive management of software aging
IBM Journal of Research and Development
Discrete-time cost analysis for a telecommunication billing application with rejuvenation
Computers & Mathematics with Applications
ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Monitoring for security intrusion using performance signatures
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Current research and practice in proactive fault management
International Journal of Computers and Applications
Self-configuring algorithm for software fault tolerance in (n,k)-way cluster systems
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Managing performance of aging applications via synchronized replica rejuvenation
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Achieving and assuring high availability
ISAS'08 Proceedings of the 5th international conference on Service availability
User-perceived software service availability modeling with reliability growth
ISAS'08 Proceedings of the 5th international conference on Service availability
Analysis of a software system with rejuvenation, restoration and checkpointing
ISAS'08 Proceedings of the 5th international conference on Service availability
Dependability metrics
On the potential of software rejuvenation for long-running sensor network deployments
Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Semi-Markov performance modelling of a redundant system with partial, full and failed rejuvenation
International Journal of Critical Computer-Based Systems
ACM Transactions on Computer Systems (TOCS)
Memory leak analysis of mission-critical middleware
Journal of Systems and Software
Methods and opportunities for rejuvenation in aging distributed software systems
Journal of Systems and Software
Analysis of service availability for time-triggered rejuvenation policies
Journal of Systems and Software
Journal of Systems and Software
IBM Journal of Research and Development
A proactive fault-detection mechanism in large-scale cluster systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On-line adaptive algorithms in autonomic restart control
ATC'10 Proceedings of the 7th international conference on Autonomic and trusted computing
Fast and correct performance recovery of operating systems using a virtual machine monitor
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Considering non-functional aspects in the design of hypermedia authoring tools
Proceedings of the 2011 ACM Symposium on Applied Computing
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
PEASOUP: preventing exploits against software of uncertain provenance (position paper)
Proceedings of the 7th International Workshop on Software Engineering for Secure Systems
A Petri net model for service availability in redundant computing systems
Winter Simulation Conference
Towards IT systems capable of managing their health
FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems
Architecture-based run-time fault diagnosis
ECSA'11 Proceedings of the 5th European conference on Software architecture
Detecting and surviving data races using complementary schedules
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Checkpointing strategies for parallel jobs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Controlling software architecture erosion: A survey
Journal of Systems and Software
REASSURE: a self-contained mechanism for healing software using rescue points
IWSEC'11 Proceedings of the 6th International conference on Advances in information and computer security
Experimental evaluation of software aging effects on the eucalyptus cloud computing infrastructure
Proceedings of the Middleware 2011 Industry Track Workshop
A survivability model for cluster system
ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
Monitoring the health condition of a ubiquitous system: rejuvenation vs. recovery
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A dependability management mechanism for ubiquitous computing systems
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Modeling and cost analysis of nested software rejuvenation policy
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
Analysis of a service degradation model with preventive rejuvenation
ISAS'06 Proceedings of the Third international conference on Service Availability
Study on application server aging prediction based on wavelet network with hybrid genetic algorithm
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
A self-healing component sandbox for untrustworthy third party code execution
CBSE'10 Proceedings of the 13th international conference on Component-Based Software Engineering
A model of ITS using cold standby cluster
ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Dependable and Historic Computing
Admission control policies for a multi-class QoS-aware service oriented architecture
ACM SIGMETRICS Performance Evaluation Review
Fault Resilient Real-Time Design for NoC Architectures
ICCPS '12 Proceedings of the 2012 IEEE/ACM Third International Conference on Cyber-Physical Systems
A proactive approach towards always-on availability in broadband cable networks
Computer Communications
To increase survivability with software rejuvenation by having dual base station in WSN environment
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
A survivability model in wireless sensor networks
Computers & Mathematics with Applications
MemRed: towards reliable web applications
Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
Towards dependable clients: improving the reliability and availability of the browsers
Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
Self-healing multitier architectures using cascading rescue points
Proceedings of the 28th Annual Computer Security Applications Conference
A comparative experimental study of software rejuvenation overhead
Performance Evaluation
Editorial: Special Issue on Software Aging and Rejuvenation - Guest Editorial
Performance Evaluation
Predicting aging-related bugs using software complexity metrics
Performance Evaluation
How does testing affect the availability of aging software systems?
Performance Evaluation
Architecture-based self-protecting software systems
Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Automatic recovery from runtime failures
Proceedings of the 2013 International Conference on Software Engineering
A framework for self-healing software systems
Proceedings of the 2013 International Conference on Software Engineering
Diagnosing architectural run-time failures
Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Proceedings of the 17th Conference on Pattern Languages of Programs
On the efficiency of durable state machine replication
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Job completion time on a virtualized server with software rejuvenation
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software aging in the eucalyptus cloud computing infrastructure: Characterization and rejuvenation
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation
Performance Evaluation
A Systematic Survey of Self-Protecting Software Systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS) - Special Section on Best Papers from SEAMS 2012
Software health management with Bayesian networks
Innovations in Systems and Software Engineering
Checkpointing algorithms and fault prediction
Journal of Parallel and Distributed Computing
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Hi-index | 0.01 |
Software rejuvenation is the concept of gracefully terminating an application and immediately restarting it at a clean internal state. In a client-server type of application where the server is intended to run perpetually for providing a service to its clients, rejuvenating the server process periodically during the most idle time of the server increases the availability of that service.In a long-running computation-intensive application, rejuvenating the application periodically and restarting it at a previous checkpoint increases the likelihood of successfully completing the application execution. We present a model for analyzing software rejuvenation in such continuously-running applications and express downtime and costs due to downtime during rejuvneation in terms of the parameters in that model. Threshold conditions for rejuvenation to be beneficial are also derived. We implemented a reusable module to perform software rejuvenation. That module can be embedded in any existing application on a UNIX platform with minimal effort. Experiences with software rejuvenation in a billing data collection subsystem of a telecommunications operations system and other continuously-running systems and scientific applications in AT&T are described.