Authorship attribution with latent Dirichlet allocation

  • Authors:
  • Yanir Seroussi;Ingrid Zukerman;Fabian Bohnert

  • Affiliations:
  • Monash University, Clayton, Victoria, Australia;Monash University, Clayton, Victoria, Australia;Monash University, Clayton, Victoria, Australia

  • Venue:
  • CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of authorship attribution -- attributing texts to their original authors -- has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on authorship attribution focuses on scenarios with only a few candidate authors, but recently considered cases with tens to thousands of candidate authors were found to be much more challenging. In this paper, we propose ways of employing Latent Dirichlet Allocation in authorship attribution. We show that our approach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively.