Analysis and implementation of software rejuvenation in cluster systems
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Evaluating the Impact of Communication Architecture on the Performability of Cluster-Based Services
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Improving availability with recursive microreboots: a soft-state system case study
Performance Evaluation - Dependable systems and networks-performance and dependability symposium (DSN-PDS) 2002: Selected papers
Cheap recovery: a key to self-managing state
ACM Transactions on Storage (TOS)
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Modeling and analysis of software aging and software failure
Journal of Systems and Software
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Session state: beyond soft state
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Estimating Periodic Software Rejuvenation Schedules under Discrete-Time Operation Circumstance
IEICE - Transactions on Information and Systems
Availability analysis of application servers using software rejuvenation and virtualization
Journal of Computer Science and Technology
A new model for evaluating performability under the effects of software aging and rejuvenation
SEA '07 Proceedings of the 11th IASTED International Conference on Software Engineering and Applications
Proactive management of software aging
IBM Journal of Research and Development
A survey of online failure prediction methods
ACM Computing Surveys (CSUR)
ICIC '07 Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence
Self-configuring algorithm for software fault tolerance in (n,k)-way cluster systems
ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Managing performance of aging applications via synchronized replica rejuvenation
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
Achieving and assuring high availability
ISAS'08 Proceedings of the 5th international conference on Service availability
Fault perturbations in building sensor network data streams
International Journal of Sensor Networks
Memory leak analysis of mission-critical middleware
Journal of Systems and Software
EVEREST+: run-time SLA violations prediction
Proceedings of the 5th International Workshop on Middleware for Service Oriented Computing
Predicting failures of computer systems: a case study for a telecommunication system
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A proactive fault-detection mechanism in large-scale cluster systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Software aging assessment through a specialization of the SQuaRE quality model
WOSQ'09 Proceedings of the Seventh ICSE conference on Software quality
Root-cause analysis of performance anomalies in web-based applications
Proceedings of the 2011 ACM Symposium on Applied Computing
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
A dependability management mechanism for ubiquitous computing systems
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
Study on application server aging prediction based on wavelet network with hybrid genetic algorithm
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Software rejuvenation in the cloud
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
A proactive approach towards always-on availability in broadband cable networks
Computer Communications
Probabilistic resource allocation in heterogeneous distributed systems with random failures
Journal of Parallel and Distributed Computing
To increase survivability with software rejuvenation by having dual base station in WSN environment
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
MemRed: towards reliable web applications
Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management
Towards dependable clients: improving the reliability and availability of the browsers
Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
A comparative experimental study of software rejuvenation overhead
Performance Evaluation
Predicting aging-related bugs using software complexity metrics
Performance Evaluation
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation
Performance Evaluation
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Hi-index | 0.00 |
The phenomenon of software aging refers to the accumulation of errors during the execution of the software which eventually results in it's crash/hang failure. A gradual performance degradation may also accompany software aging. Pro-active fault management techniques such as ``Software rejuvenation'' [1] may be used to counteract aging if it exists. In this paper, we propose a methodology for detection and estimation of aging in the UNIX operating system. First, we present the design and implementation of an SNMP based, distributed monitoring tool used to collect operating system resource usage and system activity data from networked UNIX workstations at regular intervals. Statistical trend detection techniques are applied to this data to detect/validate the existence of aging. For quantifying the effect of aging in operating system resources, we propose the metric ``Estimated time to exhaustion'' which is calculated using well known slope estimation techniques. Although the distributed data collection tool is specific to UNIX, the statistical techniques can be used for detection and estimation of aging in other software as well.