Computer algorithms: introduction to design and analysis (2nd ed.)
Computer algorithms: introduction to design and analysis (2nd ed.)
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Mining e-mail content for author identification forensics
ACM SIGMOD Record
A Simple KNN Algorithm for Text Categorization
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Neural Network Based Approach to Automated E-Mail Classification
WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
An empirical study of spam traffic and the use of DNS black lists
Proceedings of the 4th ACM SIGCOMM conference on Internet measurement
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Proceedings of the 2009 ACM symposium on Applied Computing
Revealing common sources of image spam by unsupervised clustering with visual features
Proceedings of the 2009 ACM symposium on Applied Computing
Image spam clustering: an unsupervised approach
MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
A Framework for the Forensic Analysis of User Interaction with Social Media
International Journal of Digital Crime and Forensics
Simplified features for email authorship identification
International Journal of Security and Networks
Hi-index | 0.00 |
In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.