Clustering malware-generated spam emails with a novel fuzzy string matching algorithm

  • Authors:
  • Chun Wei;Alan Sprague;Gary Warner

  • Affiliations:
  • Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL

  • Venue:
  • Proceedings of the 2009 ACM symposium on Applied Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, a fuzzy-matching clustering algorithm is introduced to group subjects found in spam emails which are generated by malware. A modified scoring strategy is applied in dynamic programming to find subjects that are similar to each other. A recursive seed selection strategy allows the algorithm to detect similar patterns even when the spammer creates a variation of the original pattern. A sliding threshold based on string length helps to minimize false-positives. The algorithm proves to be effective in detecting and grouping spam emails using templates. It also helps spam investigators to collect and sort large amount of malware-generated spam more efficiently without looking at the email content.