Software Reliability and Rejuvenation: Modeling and Analysis
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
Effective Fault Treatment for Improving the Dependability of COTS and Legacy-Based Applications
IEEE Transactions on Dependable and Secure Computing
A Comprehensive Model for Software Rejuvenation
IEEE Transactions on Dependable and Secure Computing
Vigilant: out-of-band detection of failures in virtual machines
ACM SIGOPS Operating Systems Review
Automatic software interference detection in parallel applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
High-available grid services through the use of virtualized clustering
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
A survey of online failure prediction methods
ACM Computing Surveys (CSUR)
I-queue: smart queues for service management
ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Predicting aging-related bugs using software complexity metrics
Performance Evaluation
A survey of software aging and rejuvenation studies
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
Software rejuvenation scheduling using accelerated life testing
ACM Journal on Emerging Technologies in Computing Systems (JETC) - Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011
A comprehensive approach to optimal software rejuvenation
Performance Evaluation
Workload-aware anomaly detection for Web applications
Journal of Systems and Software
Hi-index | 0.00 |
Software aging phenomena have been recently studied; one particularly complex type is shared memory pool latch contention in large OLTP servers. Latch contention onset leads to severe performance degradation until a manual rejuvenation of the DBMS shared memory pool is triggered. Conventional approaches to automated rejuvenation have failed for latch contention because no single resource metric has been identified that can be monitored to alert the onset of this complex mechanism. The current investigation explores the feasibility ofapplying an advanced pattern recognition method that is embodied in a commercially available equipment condition monitoring system (SmartSignal eCM驴) for proactive annunciation of software-aging faults. One hundred data signals are monitored from a large OLTP server, collected at 20-60 sec. intervals over a 5-month period. Results show 13 variables consistently deviate from normal operation prior to a latch event, providing up to 2 hours early warning.