On the role of information compaction to intrusion detection

Authors:
Fernando Godínez;Dieter Hutter;Raúl Monroy
Affiliations:
Centre for Intelligent Systems, ITESM-Monterrey, Monterrey, Mexico;DFKI, Saarbrücken University, Saarbrücken, Germany;Department of Computer Science, ITESM–Estado de México, Estado de México, Mexico
Venue:
ISSADS'05 Proceedings of the 5th international conference on Advanced Distributed Systems
Year:
2005

Citing 7
Cited 0

Temporal sequence learning and data reduction for anomaly detection

ACM Transactions on Information and System Security (TISSEC)
Characterizing the behavior of a program using multiple-length N-grams

Proceedings of the 2000 workshop on New security paradigms
Anomaly Detection in Embedded Systems

IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
Mimicry attacks on host-based intrusion detection systems

Proceedings of the 9th ACM conference on Computer and communications security
Benchmarking Anomaly-Based Detection Systems

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Data Reduction Techniques for Instance-Based Learning from Human/Computer Interface Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A sense of self for Unix processes

SP'96 Proceedings of the 1996 IEEE conference on Security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

An intrusion detection system (IDS) usually has to analyse Giga-bytes of audit information. In the case of anomaly IDS, the information is used to build a user profile characterising normal behaviour. Whereas for misuse IDSs, it is used to test against known attacks. Probabilistic methods, e.g. hidden Markov models, have proved to be suitable to profile formation but are prohibitively expensive. To bring these methods into practise, this paper aims to reduce the audit information by folding up subsequences that commonly occur within it. Using n-grams language models, we have been able to successfully identify the n-grams that appear most frequently. The main contribution of this paper is a n-gram extraction and identification process that significantly reduces an input log file keeping key information for intrusion detection. We reduced log files by a factor of 3.6 in the worst case and 4.8 in the best case. We also tested reduced data using hidden Markov models (HMMs) for intrusion detection. The time needed to train the HMMs is greatly reduced by using our reduced log files, but most importantly, the impact on both the detection and false positive ratios are negligible.