Authorship Attribution with Support Vector Machines

  • Authors:
  • Joachim Diederich;Jörg Kindermann;Edda Leopold;Gerhard Paass

  • Affiliations:
  • School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Q-4072, Australia;GMD—Forschungszentrum Informationstechnik, D-52754 Sankt, Augustin;GMD—Forschungszentrum Informationstechnik, D-52754 Sankt, Augustin;GMD—Forschungszentrum Informationstechnik, D-52754 Sankt, Augustin

  • Venue:
  • Applied Intelligence
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we explore the use of text-mining methods for the identification of the author of a text. We apply the support vector machine (SVM) to this problem, as it is able to cope with half a million of inputs it requires no feature selection and can process the frequency vector of all words of a text. We performed a number of experiments with texts from a German newspaper. With nearly perfect reliability the SVM was able to reject other authors and detected the target author in 60–80% of the cases. In a second experiment, we ignored nouns, verbs and adjectives and replaced them by grammatical tags and bigrams. This resulted in slightly reduced performance. Author detection with SVMs on full word forms was remarkably robust even if the author wrote about different topics.