Temporal sequence learning and data reduction for anomaly detection
ACM Transactions on Information and System Security (TISSEC)
Characterizing the behavior of a program using multiple-length N-grams
Proceedings of the 2000 workshop on New security paradigms
Anomaly Detection in Embedded Systems
IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Mimicry attacks on host-based intrusion detection systems
Proceedings of the 9th ACM conference on Computer and communications security
Benchmarking Anomaly-Based Detection Systems
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Data Reduction Techniques for Instance-Based Learning from Human/Computer Interface Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A sense of self for Unix processes
SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Hi-index | 0.00 |
An intrusion detection system (IDS) usually has to analyse Giga-bytes of audit information. In the case of anomaly IDS, the information is used to build a user profile characterising normal behaviour. Whereas for misuse IDSs, it is used to test against known attacks. Probabilistic methods, e.g. hidden Markov models, have proved to be suitable to profile formation but are prohibitively expensive. To bring these methods into practise, this paper aims to reduce the audit information by folding up subsequences that commonly occur within it. Using n-grams language models, we have been able to successfully identify the n-grams that appear most frequently. The main contribution of this paper is a n-gram extraction and identification process that significantly reduces an input log file keeping key information for intrusion detection. We reduced log files by a factor of 3.6 in the worst case and 4.8 in the best case. We also tested reduced data using hidden Markov models (HMMs) for intrusion detection. The time needed to train the HMMs is greatly reduced by using our reduced log files, but most importantly, the impact on both the detection and false positive ratios are negligible.