Selected papers of the sixth annual Oregon workshop on Software metrics
Temporal sequence learning and data reduction for anomaly detection
ACM Transactions on Information and System Security (TISSEC)
Transient fault detection via simultaneous multithreading
Proceedings of the 27th annual international symposium on Computer architecture
Exploring the relationship between design measures and software quality in object-oriented systems
Journal of Systems and Software
Characterizing the behavior of a program using multiple-length N-grams
Proceedings of the 2000 workshop on New security paradigms
Anomaly Detection in Embedded Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Dual use of superscalar datapath for transient-fault detection and recovery
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Accuracy of software quality models over multiple releases
Annals of Software Engineering
Machine Learning
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Predicting Fault-Proneness using OO Metrics: An Industrial Case Study
CSMR '02 Proceedings of the 6th European Conference on Software Maintenance and Reengineering
A Study on Fault-Proneness Detection of Object-Oriented Systems
CSMR '01 Proceedings of the Fifth European Conference on Software Maintenance and Reengineering
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques
Empirical Software Engineering
METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation
ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Longest prefix matching using bloom filters
Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Developing Fault Predictors for Evolving Software Systems
METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Learning from little: comparison of classifiers given little training
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
Static analysis of executables to detect malicious patterns
SSYM'03 Proceedings of the 12th conference on USENIX Security Symposium - Volume 12
Automated Rule-Based Diagnosis through a Distributed Monitor System
IEEE Transactions on Dependable and Secure Computing
Anagram: a content anomaly detector resistant to mimicry attack
RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
Hi-index | 0.00 |
The increased complexity and scale of high performance computing and future extreme-scale systems have made resilience a key issue, since it is expected that future systems will have various faults during critical operations. It is also expected that current solutions for resiliency, mainly counting on checkpointing in hardware and applications, will become infeasible because of unacceptable recovery time for checkpointing and restarting. In this paper, we present innovative concepts for anomaly detection and identification, analyzing the duration of pattern transition sequences of an execution window. We use a three-dimensional array of features to capture spatial and temporal variability to be used by an anomaly analysis system to immediately generate an alert and identify the source of faults when an abnormal behavior pattern is captured, indicating some kind of software or hardware failure. The main contributions of this paper include the innovative analysis methodology and feature selection to detect and identify anomalous behavior. Evaluating the effectiveness of this approach to detect faults injected asynchronously shows a detection rate of above 99.9% with no occurrences of false alarms for a wide range of scenarios, and accuracy rate of 100% with short root cause analysis time.