Authorship attribution of electronic documents comparing the use of normalized compression distance and support vector machine in authorship attribution

  • Authors:
  • Walter Ribeiro de Oliveira;Edson J. R. Justino;Luiz S. Oliveira

  • Affiliations:
  • Pontificia Universidade Católica do Parana --- PUC-PR, Curitiba, Brazil;Pontificia Universidade Católica do Parana --- PUC-PR, Curitiba, Brazil;Universidade Federal do Paraná --- UFPR, Curitiba, Brazil

  • Venue:
  • ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic attribution of text subject and even authorship attribution is possible with the use of classifiers. Previous studies used function-words and Support Vector Machine (SVM) to accomplish this task. We use a data compressor-based approach and a document similarity metric called Normalized Compression Distance (NCD). Tests were performed in the same database of a previous work, composed of 3,000 documents and 100 different authors, to allow comparison of the results. The results show that NCD can have a slightly better performance in such task, depending on the compressor used.