Mining spam email to identify common origins for forensic application

Authors:
Chun Wei;Alan Sprague;Gary Warner;Anthony Skjellum
Affiliations:
Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL;Univ. of Alabama at Birmingham, Birmingham, AL
Venue:
Proceedings of the 2008 ACM symposium on Applied computing
Year:
2008

Citing 8
Cited 5

Computer algorithms: introduction to design and analysis (2nd ed.)

Computer algorithms: introduction to design and analysis (2nd ed.)
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Mining e-mail content for author identification forensics

ACM SIGMOD Record
A Simple KNN Algorithm for Text Categorization

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Neural Network Based Approach to Automated E-Mail Classification

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
An empirical study of spam traffic and the use of DNS black lists

Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatio-temporal document evolution model

Proceedings of the 2009 ACM symposium on Applied Computing
Revealing common sources of image spam by unsupervised clustering with visual features

Proceedings of the 2009 ACM symposium on Applied Computing
Image spam clustering: an unsupervised approach

MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
A Framework for the Forensic Analysis of User Interaction with Social Media

International Journal of Digital Crime and Forensics
Simplified features for email authorship identification

International Journal of Security and Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.