Forensic Authorship Attribution Using Compression Distances to Prototypes

  • Authors:
  • Maarten Lambers;Cor J. Veenman

  • Affiliations:
  • Digital Technology & Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands;Intelligent Systems Lab, University of Amsterdam, Amsterdam, The Netherlands and Digital Technology & Biometrics Department, Netherlands Forensic Institute, The Hague, The Netherlands

  • Venue:
  • IWCF '09 Proceedings of the 3rd International Workshop on Computational Forensics
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In several situations authors prefer to hide their identity. In forensic applications, one can think of extortion and threats in emails and forum messages. These types of messages can easily be adjusted, such that meta data referring to names and addresses is at least unreliable. In this paper, we propose a method to identify authors of short informal messages solely based on the text content. The method uses compression distances between texts as features. Using these features a supervised classifier is learned on a training set of known authors. For the experiments, we prepared a dataset from Dutch newsgroup texts. We compared several state-of-the-art methods to our proposed method for the identification of messages from up to 50 authors. Our method clearly outperformed the other methods. In 65% of the cases the author could be correctly identified, while in 88% of the cases the true author was in the top 5 of the produced ranked list.