Feature based techniques for auto-detection of novel email worms

  • Authors:
  • Mohammad M. Masud;Latifur Khan;Bhavani Thuraisingham

  • Affiliations:
  • Department of Computer Science, The University of Texas at Dallas, Richardson, Texas;Department of Computer Science, The University of Texas at Dallas, Richardson, Texas;Department of Computer Science, The University of Texas at Dallas, Richardson, Texas

  • Venue:
  • PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work focuses on applying data mining techniques to detect email worms. We apply a feature-based detection technique. These features are extracted using different statistical and behavioral analysis of emails sent over a certain period of time. The number of features thus extracted is too large. So, our goal is to select the best set of features that can efficiently distinguish between normal and viral emails using classification techniques. First, we apply Principal Component Analysis (PCA) to reduce the high dimensionality of data and to find a projected, optimal set of attributes. We observe that the application of PCA on a benchmark dataset improves the accuracy of detecting novel worms. Second, we apply J48 decision tree algorithm to determine the relative importance of features based on information gain. We are able to identify a subset of features, along with a set of classification rules that have a better performance in detecting novel worms than the original set of features or PCA-reduced features. Finally, we compare our results with published results and discuss our future plans to extend this work.