Knowledge Discovery from Honeypot Data for Monitoring Malicious Attacks

  • Authors:
  • Huidong Jin;Olivier Vel;Ke Zhang;Nianjun Liu

  • Affiliations:
  • NICTA Canberra Lab, Locked Bag 8001, Canberra ACT, Australia 2601 and RSISE, the Australian National University, Canberra ACT, Australia 0200;Command, Control, Communications and Intelligence Division, DSTO, Edinburgh, Australia SA 5111;NICTA Canberra Lab, Locked Bag 8001, Canberra ACT, Australia 2601 and RSISE, the Australian National University, Canberra ACT, Australia 0200;NICTA Canberra Lab, Locked Bag 8001, Canberra ACT, Australia 2601 and RSISE, the Australian National University, Canberra ACT, Australia 0200

  • Venue:
  • AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Owing to the spread of worms and botnets, cyber attacks have significantly increased in volume, coordination and sophistication. Cheap rentable botnet services, e.g., have resulted in sophisticated botnets becoming an effective and popular tool for committing online crime these days. Honeypots, as information system traps, are monitoring or deflecting malicious attacks on the Internet. To understand the attack patterns generated by botnets by virtue of the analysis of the data collected by honeypots, we propose an approach that integrates a clustering structure visualisation technique with outlier detection techniques. These techniques complement each other and provide end users both a big-picture view and actionable knowledge of high-dimensional data. We introduce KNOF (K-nearest Neighbours Outlier Factor) as the outlier definition technique to reach a trade-off between global and local outlier definitions, i.e., K th -Nearest Neighbour (KNN) and Local Outlier Factor (LOF) respectively. We propose an algorithm to discover the most significant KNOF outliers. We implement these techniques in our hpdAnalyzer tool. The tool is successfully used to comprehend honeypot data. A series of experiments show that our proposed KNOF technique substantially outperforms LOF and, to a lesser degree, KNN for real-world honeypot data.