Mining spam email to identify common origins for forensic application

  • Authors:
  • Chun Wei;Alan Sprague;Gary Warner;Anthony Skjellum

  • Affiliations:
  • Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL

  • Venue:
  • Proceedings of the 2008 ACM symposium on Applied computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.