Automatic classification of security messages based on text categorization

Authors:
Fatiha Benali;Stéphane Ubéda;Véronique Legrand
Affiliations:
ARES INRIA/CITI, INSA, Lyon, France;ARES INRIA/CITI, INSA, Lyon, France;ARES INRIA/CITI, INSA, Lyon, France
Venue:
NOTERE '08 Proceedings of the 8th international conference on New technologies in distributed systems
Year:
2008

Citing 10
Cited 0

An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Review Article: Example-based Machine Translation

Machine Translation
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Alert Correlation in a Cooperative Intrusion Detection Framework

SP '02 Proceedings of the 2002 IEEE Symposium on Security and Privacy
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The generated messages by the security devices are the necessary data for the detection of the malicious activities in an information system. The heterogeneity of the devices and the lack of a standard for the security messages make the automatic processing of the messages difficult. The messages are short, use a very wide vocabulary and have different formats. We propose in this article the application of the text categorization technics for the automatic classification of security log files messages, in categories defined by an ontology. We develop an extraction module for the message attributes to reduce the vocabulary size. Then we apply two training algorithms: the k-nearest neighbour algorithm and the naive bayes, on two corpus of security log messages.