A unified data mining solution for authorship analysis in anonymous textual communications

  • Authors:
  • Farkhund Iqbal;Hamad Binsalleeh;Benjamin C. M. Fung;Mourad Debbabi

  • Affiliations:
  • Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8;Concordia Institute for Information Systems Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

The cyber world provides an anonymous environment for criminals to conduct malicious activities such as spamming, sending ransom e-mails, and spreading botnet malware. Often, these activities involve textual communication between a criminal and a victim, or between criminals themselves. The forensic analysis of online textual documents for addressing the anonymity problem called authorship analysis is the focus of most cybercrime investigations. Authorship analysis is the statistical study of linguistic and computational characteristics of the written documents of individuals. This paper is the first work that presents a unified data mining solution to address authorship analysis problems based on the concept of frequent pattern-based writeprint. Extensive experiments on real-life data suggest that our proposed solution can precisely capture the writing styles of individuals. Furthermore, the writeprint is effective to identify the author of an anonymous text from a group of suspects and to infer sociolinguistic characteristics of the author.