Software reliability: measurement, prediction, application
Software reliability: measurement, prediction, application
Rough sets: probabilistic versus deterministic approach
International Journal of Man-Machine Studies
Detection of abrupt changes: theory and application
Detection of abrupt changes: theory and application
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
Performance and reliability analysis of computer systems: an example-based approach using the SHARPE software package
Handbook of software reliability engineering
Handbook of software reliability engineering
Software reliability and system reliability
Handbook of software reliability engineering
Software reliability modeling survey
Handbook of software reliability engineering
Techniques for prediction analysis and recalibration
Handbook of software reliability engineering
ICSE '94 Proceedings of the 16th international conference on Software engineering
Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
Internet service performance failure detection
ACM SIGMETRICS Performance Evaluation Review
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Recognition of error symptoms in large systems
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
Information Retrieval
Dependability Measurement and Modeling of a Multicomputer System
IEEE Transactions on Computers
Efficient Data Mining for Path Traversal Patterns
IEEE Transactions on Knowledge and Data Engineering
Learning Logical Definitions from Relations
Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Error and Failure Analysis of a UNIX Server
HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Optimal Discrimination between Transient and Permanent Faults
HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Bayesian approaches to failure prediction for disk drives
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A Classification Approach for Prediction of Target Events in Temporal Sequences
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Industry: predicting telecommunication equipment failures from sequences of network alarms
Handbook of data mining and knowledge discovery
Predictive Application-Performance Modeling in a Computational Grid Environment
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Proactive Detection of Software Aging Mechanisms in Performance Critical Computers
SEW '02 Proceedings of the 27th Annual NASA Goddard Software Engineering Workshop (SEW-27'02)
Predicting Rare Events In Temporal Domains
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
A Methodology for Detection and Estimation of Software Aging
ISSRE '98 Proceedings of the The Ninth International Symposium on Software Reliability Engineering
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
An Approach for Estimation of Software Aging in a Web Server
ISESE '02 Proceedings of the 2002 International Symposium on Empirical Software Engineering
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
An introduction to variable and feature selection
The Journal of Machine Learning Research
Early Warning of Failures through Alarm Analysis - A Case Study in Telecom Voice Mail Systems
ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Anomalies as Precursors of Field Failures
ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Critical event prediction for proactive management in large-scale computer clusters
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fault Diagnosis: Models, Artificial Intelligence, Applications
Fault Diagnosis: Models, Artificial Intelligence, Applications
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Software failure prediction based on a Markov Bayesian network model
Journal of Systems and Software
ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Autonomous recovery in componentized Internet applications
Cluster Computing
BlueGene/L Failure Analysis and Prediction Models
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
SRDS '06 Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems
Call Availability Prediction in a Telecommunication System: A Data Driven Empirical Approach
SRDS '06 Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems
Software Aging Prediction Model Based on Fuzzy Wavelet Network with Adaptive Genetic Algorithm
ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
A Best Practice Guide to Resources Forecasting for the Apache Webserver
PRDC '06 Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing
Self-star Properties in Complex Information Systems: Conceptual and Practical Foundations (Lecture Notes in Computer Science)
Practical Statistics for Medical Research
Practical Statistics for Medical Research
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using Hidden Semi-Markov Models for Effective Online Failure Prediction
SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
Quantifying Temporal and Spatial Correlation of Failure Events for Proactive Management
SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
Predictive algorithms in the management of computer systems
IBM Systems Journal
Proactive management of software aging
IBM Journal of Research and Development
Predicting failures of computer systems: a case study for a telecommunication system
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Prediction-Based software availability enhancement
Self-star Properties in Complex Information Systems
Fuzzy wavelet networks for function learning
IEEE Transactions on Fuzzy Systems
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
Fault prediction in distributed systems gone wild
Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware
Error detection framework for complex software systems
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Architecting dependable systems with proactive fault management
Architecting dependable systems VII
Towards IT systems capable of managing their health
FOCS'10 Proceedings of the 16th Monterey conference on Foundations of computer software: modeling, development, and verification of adaptive systems
Towards accurate failure prediction for the proactive adaptation of service-oriented systems
Proceedings of the 8th workshop on Assurances for self-adaptive systems
Event log mining tool for large scale HPC systems
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
IBM Journal of Research and Development
Long-term availability prediction for groups of volunteer resources
Journal of Parallel and Distributed Computing
QoS-Driven proactive adaptation of service composition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Personal health record architectures: Technology infrastructure implications and dependencies
Journal of the American Society for Information Science and Technology
Predictive combinations of monitor alarms preceding in-hospital code blue events
Journal of Biomedical Informatics
Analysis and Evaluation of a New Algorithm Based Fault Tolerance for Computing Systems
International Journal of Grid and High Performance Computing
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A comparison of machine learning algorithms for proactive hard disk drive failure detection
Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems
Journal of Parallel and Distributed Computing
An online failure prediction system for private IaaS platforms
Proceedings of the 2nd International Workshop on Dependability Issues in Cloud Computing
Failure prediction for HPC systems and applications: Current situation and open issues
International Journal of High Performance Computing Applications
Reliable workflow scheduling with less resource redundancy
Parallel Computing
Design and Evaluation of Techniques for Resilience and Survivability of the Routing Node
International Journal of Adaptive, Resilient and Autonomic Systems
Hi-index | 0.00 |
With the ever-growing complexity and dynamicity of computer systems, proactive fault management is an effective approach to enhancing availability. Online failure prediction is the key to such techniques. In contrast to classical reliability methods, online failure prediction is based on runtime monitoring and a variety of models and methods that use the current state of a system and, frequently, the past experience as well. This survey describes these methods. To capture the wide spectrum of approaches concerning this area, a taxonomy has been developed, whose different approaches are explained and major concepts are described in detail.