Constructing attribute weights from computer audit data for effective intrusion detection

  • Authors:
  • Wei Wang;Xiangliang Zhang;Sylvain Gombault

  • Affiliations:
  • Center for Quantifiable Quality of Service (Q2S) in Communication Systems, Norwegian University of Science and Technology (NTNU), O.S. Bragstads Plass 2E, 7491 Trondheim, Norway;Laboratoire de Recherche en Informatique, Université/ Paris-Sud 11, 91405 Orsay Cedex, France;Institut Telecom/ Té/lé/com Bretagne/ RSM Université/ europé/enne de Bretagne, France, 2 rue de la Chí/taigneraie, CS 17607, 35576 Cesson-Sé/vigné/ Cedex, France

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Attributes construction and selection from audit data is the first and very important step for anomaly intrusion detection. In this paper, we present several cross frequency attribute weights to model user and program behaviors for anomaly intrusion detection. The frequency attribute weights include plain term frequency (TF) and various forms of term frequency-inverse document frequency (tfidf), referred to as Ltfidf, Mtfidf and LOGtfidf. Nearest Neighbor (NN) and k-NN methods with Euclidean and Cosine distance measures as well as principal component analysis (PCA) and Chi-square test method based on these frequency attribute weights are used for anomaly detection. Extensive experiments are performed based on command data from Schonlau et al. The testing results show that the LOGtfidf weight gives better detection performance compared with plain frequency and other types of weights. By using the LOGtfidf weight, the simple NN method and PCA method achieve the better masquerade detection results than the other 7 methods in the literature while the Chi-square test consistently returns the worst results. The PCA method is suitable for fast intrusion detection because of its capability of reducing data dimensionality while NN and k-NN methods are suitable for detection of a small data set because of its no need of training process. A HTTP log data set collected in a real environment and the sendmail system call data from University of New Mexico (UNM) are used as well and the results also demonstrate the effectiveness of the LOGtfidf weight for anomaly intrusion detection.