Author attribution of Turkish texts by feature mining

Authors:
Filiz Türkoǧlu;Banu Diri;M. Fatih Amasyali
Affiliations:
Ildiz Technical University, Computer Engineering, Istanbul, Turkey;Ildiz Technical University, Computer Engineering, Istanbul, Turkey;Ildiz Technical University, Computer Engineering, Istanbul, Turkey
Venue:
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
Year:
2007

Citing 5
Cited 0

Using Literal and Grammatical Statistics for Authorship Attribution

Problems of Information Transmission
The disputed federalist papers: SVM feature selection via concave minimization

Proceedings of the 2003 conference on Diversity in computing
Automatic authorship attribution

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Language independent authorship attribution using character level language models

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Automatic turkish text categorization in terms of author, genre and gender

NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The aim of this study is to identify the author of an unauthorized document. Ten different feature vectors are obtained from authorship attributes, n-grams and various combinations of these feature vectors that are extracted from documents, which the authors are intended to be identified. Comparative performance of every feature vector is analyzed by applying Naïve Bayes, SVM, k-NN, RF and MLP classification methods. The most successful classifiers are MLP and SVM. In document classification process, it is observed that n-grams give higher accuracy rates than authorship attributes. Nevertheless, using n-gram and authorship attributes together, gives better results than when each is used alone.