Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Quantifying and Improving the Availability of High-Performance Cluster-Based Internet Services
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Performability analysis of clustered systems with rejuvenation under varying workload
Performance Evaluation
Using fault injection and modeling to evaluate the performability of cluster-based services
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
High-available grid services through the use of virtualized clustering
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Estimating Periodic Software Rejuvenation Schedules under Discrete-Time Operation Circumstance
IEICE - Transactions on Information and Systems
A survey of online failure prediction methods
ACM Computing Surveys (CSUR)
Semi-Markov performance modelling of a redundant system with partial, full and failed rejuvenation
International Journal of Critical Computer-Based Systems
A proactive fault-detection mechanism in large-scale cluster systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
Model based approach for autonomic availability management
ISAS'06 Proceedings of the Third international conference on Service Availability
Can dynamic provisioning and rejuvenation systems coexist in peace?
DSOM'05 Proceedings of the 16th IFIP/IEEE Ambient Networks international conference on Distributed Systems: operations and Management
Probabilistic resource allocation in heterogeneous distributed systems with random failures
Journal of Parallel and Distributed Computing
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Hi-index | 0.00 |
A number of recent studies have reported the phenomenon of "software aging", characterized by progressive performance degradation or a sudden hang/crash of a software system due to exhaustion of operating system resources, fragmentation and accumulation of errors. To counteract this phenomenon, a proactive technique called "software rejuvenation" has been proposed. This essentially involves stopping the running software, cleaning its internal state and then restarting it. Software rejuvenation, being preventive in nature, begs the question as to when to schedule it. Periodic rejuvenation, while straightforward to implement, may not yield the best results. A better approach is based on actual measurement of system resource usage and activity that detects and estimates resource exhaustion times. Estimating the resource exhaustion times makes it possible for software rejuvenation to be initiated or better planned so that the system availability is maximized in the face of time-varying workload and system behavior. In this paper, we propose a methodology based on time-series analysis to detect and estimate resource exhaustion times due to software aging in a web server while subjecting it to an artificial workload. We first collect and log data on several system resource usage and activity parameters on a web server. Time-series ARMA models are then constructed from the data to detect aging and estimate resource exhaustion times. The results are then compared with previous measurement-based models and found to be more efficientand computationally less intensive. These models can be used to develop proactive management techniques like software rejuvenation which are triggered by actual measurements.