Automatic turkish text categorization in terms of author, genre and gender

  • Authors:
  • M. Fatih Amasyalı;Banu Diri

  • Affiliations:
  • Computer Engineering Department, Yıldız Technical University, Beşiktaş, İstanbul, Turkey;Computer Engineering Department, Yıldız Technical University, Beşiktaş, İstanbul, Turkey

  • Venue:
  • NLDB'06 Proceedings of the 11th international conference on Applications of Natural Language to Information Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish document's author, classifying documents according to text's genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.