Comparing sentence-level features for authorship analysis in Portuguese

  • Authors:
  • Rui Sousa-Silva;Luís Sarmento;Tim Grant;Eugénio Oliveira;Belinda Maia

  • Affiliations:
  • Centre for Forensic Linguistics at Aston University;Faculdade de Engenharia da Universidade do Porto - DEI - LIACC;Centre for Forensic Linguistics at Aston University;Faculdade de Engenharia da Universidade do Porto - DEI - LIACC;CLUP - Centro de Linguística da Universidade do Porto

  • Venue:
  • PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we compare the robustness of several types of stylistic markers to help discriminate authorship at sentence level. We train a SVM-based classifier using each set of features separately and perform sentence-level authorship analysis over corpus of editorials published in a Portuguese quality newspaper. Results show that features based on POS information, punctuation and word / sentence length contribute to a more robust sentence-level authorship analysis.