A unified data mining solution for authorship analysis in anonymous textual communications

Authors:
Farkhund Iqbal;Hamad Binsalleeh;Benjamin C. M. Fung;Mourad Debbabi
Affiliations:
Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 26
Cited 6

Mining frequent patterns by pattern-growth: methodology and implications

ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Mining e-mail content for author identification forensics

ACM SIGMOD Record
Scalable Algorithms for Association Mining

IEEE Transactions on Knowledge and Data Engineering
Induction of Decision Trees

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Gender-Preferential Text Mining of E-mail Discourse

ACSAC '02 Proceedings of the 18th Annual Computer Security Applications Conference
Radial Basis Functions

Radial Basis Functions
Style mining of electronic messages for multiple authorship discrimination: first results

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Ensembles of nested dichotomies for multi-class problems

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A framework for authorship identification of online messages: Writing-style features and classification techniques

Journal of the American Society for Information Science and Technology
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace

ACM Transactions on Information Systems (TOIS)
Chat mining: Predicting user and message attributes in computer-mediated communication

Information Processing and Management: an International Journal
Neighborhood rough set based heterogeneous feature subset selection

Information Sciences: an International Journal
Efficient single-pass frequent pattern mining using a prefix-tree

Information Sciences: an International Journal
Stylometric Identification in Electronic Markets: Scalability and Robustness

Journal of Management Information Systems
A survey of modern authorship attribution methods

Journal of the American Society for Information Science and Technology
A Comparison of Tools for Detecting Fake Websites

Computer
A probabilistic reputation model based on transaction ratings

Information Sciences: an International Journal
e-mail authorship verification for forensic investigation

Proceedings of the 2010 ACM Symposium on Applied Computing
Soft fuzzy rough sets for robust feature evaluation and selection

Information Sciences: an International Journal
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Effective and scalable authorship attribution using function words

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
A novel approach of mining write-prints for authorship attribution in e-mail forensics

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Mining writeprints from anonymous e-mails for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response

Machine learning in computer forensics (and the lessons learned from machine learning in computer security)

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization

Information Sciences: an International Journal
Conversationally-inspired stylometric features for authorship attribution in instant messaging

Proceedings of the 20th ACM international conference on Multimedia
Editorial: Guest editorial: Special issue on data mining for information security

Information Sciences: an International Journal
Semi-random subspace method for writeprint identification

Neurocomputing
Reliability assessment and failure analysis of lithium iron phosphate batteries

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

The cyber world provides an anonymous environment for criminals to conduct malicious activities such as spamming, sending ransom e-mails, and spreading botnet malware. Often, these activities involve textual communication between a criminal and a victim, or between criminals themselves. The forensic analysis of online textual documents for addressing the anonymity problem called authorship analysis is the focus of most cybercrime investigations. Authorship analysis is the statistical study of linguistic and computational characteristics of the written documents of individuals. This paper is the first work that presents a unified data mining solution to address authorship analysis problems based on the concept of frequent pattern-based writeprint. Extensive experiments on real-life data suggest that our proposed solution can precisely capture the writing styles of individuals. Furthermore, the writeprint is effective to identify the author of an anonymous text from a group of suspects and to infer sociolinguistic characteristics of the author.