The use of orthogonal similarity relations in the prediction of authorship

  • Authors:
  • Upendra Sapkota;Thamar Solorio;Manuel Montes-y-Gómez;Paolo Rosso

  • Affiliations:
  • University of Alabama at Birmingham, Birmingham, AL;University of Alabama at Birmingham, Birmingham, AL;Instituto Nacional de Astrofísica, Optica y Electrónica, Puebla, Mexico;NLE Lab - ELiRF, Universitat Politècnica de València, Valencia, Spain

  • Venue:
  • CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent work on Authorship Attribution (AA) proposes the use of meta characteristics to train author models. The meta characteristics are orthogonal sets of similarity relations between the features from the different candidate authors. In that approach, the features are grouped and processed separately according to the type of information they encode, the so called linguistic modalities. For instance, the syntactic, stylistic and semantic features are each considered different modalities as they represent different aspects of the texts. The assumption is that the independent extraction of meta characteristics results in more informative feature vectors, that in turn result in higher accuracies. In this paper we set out to the task of studying the empirical value of this modality specific process. We experimented with different ways of generating the meta characteristics on different data sets with different numbers of authors and genres. Our results show that by extracting the meta characteristics from splitting features by their linguistic dimension we achieve consistent improvement of prediction accuracy.