Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
State space exploration using feedback constraint generation and Monte-Carlo sampling
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Failure Detection in Large-Scale Internet Services by Principal Subspace Mapping
IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering
Automatic software interference detection in parallel applications
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Mining library specifications using inductive logic programming
Proceedings of the 30th international conference on Software engineering
Dynamic inference of likely data preconditions over predicates by tree learning
ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis
Isolation points: Creating performance-robust enterprise systems
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
On the use of computational geometry to detect software faults at runtime
Proceedings of the 7th international conference on Autonomic computing
Mining invariants from console logs for system problem detection
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
Synoptic: summarizing system logs with refinement
SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
HotACI'06 Proceedings of the First international conference on Hot topics in autonomic computing
Leveraging existing instrumentation to automatically infer invariant-constrained models
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Mining temporal invariants from partially ordered logs
SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
I-queue: smart queues for service management
ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Mining temporal invariants from partially ordered logs
ACM SIGOPS Operating Systems Review
New malicious code detection using variable length n-grams
ICISS'06 Proceedings of the Second international conference on Information Systems Security
NetCheck: network diagnoses from blackbox traces
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Detection and diagnosis of faults in a large-scale distributed system is a formidable task. Interest in monitoring and using traces of user requests for fault detection has been on the rise recently. In this paper we propose novel fault detection methods based on abnormal trace detection. One essential problem is how to represent the large amount of training trace data compactly as an oracle. Our key contribution is the novel use of varied-length n-grams and automata to characterize normal traces. A new trace is compared against the learned automata to determine whether it is abnormal. We develop algorithms to automatically extract n-grams and construct multi-resolution automata from training data. Further both deterministic and multihypothesis algorithms are proposed for detection. We inspect the trace constraints of real application software and verify the existence of long n-grams. Our approach is tested in a real system with injected faults and achieves good results in experiments.