Redesign and implementation of evaluation dataset for intrusion detection system

  • Authors:
  • Jun Qian;Chao Xu;Meilin Shi

  • Affiliations:
  • Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China;Department of Computer Science, Tsinghua University, Beijing, P.R. China

  • Venue:
  • ETRICS'06 Proceedings of the 2006 international conference on Emerging Trends in Information and Communication Security
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although the intrusion detection system industry is rapidly maturing, the state of intrusion detection system evaluation is not. The off-line dataset evaluation proposed by MIT Lincoln Lab is a practical solution in terms of evaluating the performance of IDS. While the evaluation dataset represents a significant and monumental undertaking, there remain several issues unsolved in the design and modeling of the resulting dataset which may make the evaluation results biased. Some researchers have noticed such problems and criticized the design and execution of the dataset, but there is no technical contribution for new efforts proposed per se. In this paper we present our efforts to redesign and generate new dataset. We first study how network applications and user behaviors characterize the network traffic. Second, we apply ourselves to improve on the background traffic simulation (including HTTP, SMTP, POP, P2P, FTP and other types of traffic). Unlike the existing model, our model simulates traffic from user level rather than from packet level, which is more reasonable for background traffic modeling and simulation. Our model takes advantage of user-level web mining, automatic user profiling and Enron email dataset etc. The high fidelity of simulated background traffic is shown in experiment. Moreover, different kinds of attacker personalities are profiled and more than 300 instances of 62 different automated attacks are launched against victim hosts and servers. All our efforts try to make the dataset more “real” and therefore be fairer for IDS evaluation.