Authorship Identification for Online Text

  • Authors:
  • Richmond Hong Rui Tan;Flora S. Tsai

  • Affiliations:
  • -;-

  • Venue:
  • CW '10 Proceedings of the 2010 International Conference on Cyberworlds
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Authorship identification for online text such as blogs and e-books is a challenging problem as these documents do not have a considerable amount of content. Therefore, identification is much harder than other documents such as books and reports. The paper investigates the choice of features and classifier accuracy which are suitable for such texts. Syntactic features are found to be good for large data sets, whereas lexical features are good for small data sets. The results can be used to customize and further improve authorship detection techniques according to the characteristics of the writing samples.