Author attribution of Turkish texts by feature mining

  • Authors:
  • Filiz Türkoǧlu;Banu Diri;M. Fatih Amasyali

  • Affiliations:
  • Ildiz Technical University, Computer Engineering, Istanbul, Turkey;Ildiz Technical University, Computer Engineering, Istanbul, Turkey;Ildiz Technical University, Computer Engineering, Istanbul, Turkey

  • Venue:
  • ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this study is to identify the author of an unauthorized document. Ten different feature vectors are obtained from authorship attributes, n-grams and various combinations of these feature vectors that are extracted from documents, which the authors are intended to be identified. Comparative performance of every feature vector is analyzed by applying Naïve Bayes, SVM, k-NN, RF and MLP classification methods. The most successful classifiers are MLP and SVM. In document classification process, it is observed that n-grams give higher accuracy rates than authorship attributes. Nevertheless, using n-gram and authorship attributes together, gives better results than when each is used alone.