Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers
SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Security analysis of SITAR intrusion tolerance system
Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems: in association with 10th ACM Conference on Computer and Communications Security
Proactive Fault Handling for System Availability Enhancement
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
Ensuring stable performance for systems that degrade
Proceedings of the 5th international workshop on Software and performance
Autonomic computing: emerging trends and open problems
DEAS '05 Proceedings of the 2005 workshop on Design and evolution of autonomic application software
Modeling and analysis of software aging and software failure
Journal of Systems and Software
Ensuring system performance for cluster and single server systems
Journal of Systems and Software
Software Reliability Engineering: A Roadmap
FOSE '07 2007 Future of Software Engineering
Availability analysis of application servers using software rejuvenation and virtualization
Journal of Computer Science and Technology
Methods and opportunities for rejuvenation in aging distributed software systems
Journal of Systems and Software
Analysis of service availability for time-triggered rejuvenation policies
Journal of Systems and Software
Software aging assessment through a specialization of the SQuaRE quality model
WOSQ'09 Proceedings of the Seventh ICSE conference on Software quality
A dependability management mechanism for ubiquitous computing systems
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Modeling and cost analysis of nested software rejuvenation policy
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
The development of dependable and survivable grids
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
A proactive approach towards always-on availability in broadband cable networks
Computer Communications
Investigating dynamic reliability and availability through state-space models
Computers & Mathematics with Applications
Proceedings of the 17th Conference on Pattern Languages of Programs
On improving the dependability of cloud applications with fault-tolerance
Proceedings of the WICSA 2014 Companion Volume
Hi-index | 0.00 |
Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of 驴software aging驴, one in which the state of the software system degrades with time, has been reported. To counteract this phenomenon, a proactive approach of fault management, called 驴software rejuvenation驴, has been proposed. This essentially involves gracefully terminating an application or a system and restarting it in a clean internal state.In this paper, we discuss stochastic models to evaluate the effectiveness of proactive fault management in operational software systems and determine optimal times to perform rejuvenation, for different scenarios. The latter part of the paper deals with measurement-based methodologies to detect software aging and estimate its effect on various system resources. Models are constructed using workload and resource usage data collected from the UNIX operating system over a period. The measurement-based models are intended to help development of strategies for software rejuvenation triggered by actual measurements.