Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Two state-based approaches to program-based anomaly detection
ACSAC '00 Proceedings of the 16th Annual Computer Security Applications Conference
Anomaly Detection Using Real-Valued Negative Selection
Genetic Programming and Evolvable Machines
Symbolic dynamic analysis of complex systems for anomaly detection
Signal Processing
A Survey of Outlier Detection Methodologies
Artificial Intelligence Review
Blind construction of optimal nonlinear recursive predictors for discrete sequences
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Data mining approaches for intrusion detection
SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Intrusion detection using sequences of system calls
Journal of Computer Security
Comparative Evaluation of Anomaly Detection Techniques for Sequence Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
ACM Computing Surveys (CSUR)
A reference based analysis framework for analyzing system call traces
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
A sense of self for Unix processes
SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Anomaly Detection for Discrete Sequences: A Survey
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Anomaly detection for symbolic sequence data is a highly important area of research and is relevant in many application domains. While several techniques have been proposed within different domains, understanding of their relative strengths and weaknesses is limited. The key factor for this is that the nature of sequence data varies significantly across domains, and hence while a technique might perform well in its original domain, its performance is not guaranteed in a different domain. In this paper, we aim at establishing this understanding for a wide variety of anomaly detection techniques for symbolic sequences. We present a comparative evaluation of a large number of anomaly detection techniques on a variety of publicly available as well as artificially generated data sets. Many of these are existing techniques while some are slight variants and/or adaptations of traditional anomaly detection techniques to sequence data. The analysis presented in this paper allows relative comparison of the different anomaly detection techniques and highlights their strengths and weaknesses. We extend the reference based analysis (RBA) framework, which was originally proposed to analyze multivariate categorical data, to analyze symbolic sequence data sets. We visualize the symbolic sequences using the characteristics provided by the RBA framework and use the visualization to understand various aspects of the sequence data. We then use the characterization done by RBA to understand the performance of the different techniques. Using the RBA framework, we propose two anomaly detection techniques for symbolic sequences, which show consistently superior performance over the existing techniques across the different data sets.