A comparison of textual data mining methods for sex identification in chat conversations

  • Authors:
  • Cemal Köse;Özcan Özyurt;Cevat İkibaİ

  • Affiliations:
  • Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University, Trabzon, Turkey;Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University, Trabzon, Turkey;Department of Computer Engineering, Faculty of Engineering, Karadeniz Technical University, Trabzon, Turkey

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining textual data in chat mediums is becoming more important because these mediums contain a vast amount of information, which is potentially relevant to a society's current interests, habits, social behaviors, crime tendency and other tendencies. Here, sex identification is taken as a base study in information mining in chat mediums. In order to do this, a simple discrimination function and semantic analysis method are proposed for sex identification in Turkish chat mediums. Then, the proposed sex identification method is compared with the Support Vector Machine (SVM) and Naive Bayes (NB) methods. Finally, results show that the proposed system has achieved accuracy over 90% in sex identification.