A new document author representation for authorship attribution

  • Authors:
  • Adrián Pastor López-Monroy;Manuel Montes-y-Gómez;Luis Villaseñor-Pineda;Jesús Ariel Carrasco-Ochoa;José Fco. Martínez-Trinidad

  • Affiliations:
  • Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico;Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico;Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico;Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico;Computer Science Department, National Institute for Astrophysics, Optics and Electronics, Tonantzintla, Puebla, Mexico

  • Venue:
  • MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we compare the proposed representation with conventional approaches and previous works using the c50 corpus. We found that DAR can be very useful in AA tasks, because it provides good performance on imbalanced data, getting comparable or better accuracy results.