Feature selections for authorship attribution

  • Authors:
  • Jacques Savoy

  • Affiliations:
  • University of Neuchatel, Neuchâtel, Switzerland

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The authorship attribution (AA) problem can be viewed as a categorization problem. To determine the most effective features to discriminate between different authors, we have evaluated six independent feature-scoring selection functions (information gain, pointwise mutual information, odds ratio, χ2, DIA, and the document frequency (df)). To compare these approaches, we have selected articles related to sports in a newspaper corpus (La Stampa). Using the KL divergence [1] as attribution scheme, we found that the df selection strategy tends to produce high performance levels similar to more complex ones.