A novel approach of mining write-prints for authorship attribution in e-mail forensics

Authors:
Farkhund Iqbal;Rachid Hadjidj;Benjamin C. M. Fung;Mourad Debbabi
Affiliations:
Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Faculty of Engineering and Computer Science, Concordia University, Montreal, Quebec, Canada H3G 1M8
Venue:
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Year:
2008

Citing 11
Cited 9

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Efficient mining of association rules in text databases

Proceedings of the eighth international conference on Information and knowledge management
Mining frequent patterns by pattern-growth: methodology and implications

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Induction of Decision Trees

Machine Learning
Gender-Preferential Text Mining of E-mail Discourse

ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
Author verification by linguistic profiling: An exploration of the parameter space

ACM Transactions on Speech and Language Processing (TSLP)
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace

ACM Transactions on Information Systems (TOIS)
Authorship analysis in cybercrime investigation

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics

A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
e-mail authorship verification for forensic investigation

Proceedings of the 2010 ACM Symposium on Applied Computing
Authorship similarity detection from email messages

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Towards multiple identity detection in social networks

Proceedings of the 21st international conference companion on World Wide Web
Towards an integrated e-mail forensic analysis framework

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Mining writeprints from anonymous e-mails for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response
A unified data mining solution for authorship analysis in anonymous textual communications

Information Sciences: an International Journal
Semi-random subspace method for writeprint identification

Neurocomputing
Simplified features for email authorship identification

International Journal of Security and Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is an alarming increase in the number of cybercrime incidents through anonymous e-mails. The problem of e-mail authorship attribution is to identify the most plausible author of an anonymous e-mail from a group of potential suspects. Most previous contributions employed a traditional classification approach, such as decision tree and Support Vector Machine (SVM), to identify the author and studied the effects of different writing style features on the classification accuracy. However, little attention has been given on ensuring the quality of the evidence. In this paper, we introduce an innovative data mining method to capture the write-print of every suspect and model it as combinations of features that occurred frequently in the suspect's e-mails. This notion is called frequent pattern, which has proven to be effective in many data mining applications, but it is the first time to be applied to the problem of authorship attribution. Unlike the traditional approach, the extracted write-print by our method is unique among the suspects and, therefore, provides convincing and credible evidence for presenting it in a court of law. Experiments on real-life e-mails suggest that the proposed method can effectively identify the author and the results are supported by a strong evidence.