Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish

  • Authors:
  • Levent Özgür;Tunga Güngör;Fikret Gürgen

  • Affiliations:
  • Department of Computer Engineering, Boǧaziçi University, Istanbul 34342, Turkey;Department of Computer Engineering, Boǧaziçi University, Istanbul 34342, Turkey;Department of Computer Engineering, Boǧaziçi University, Istanbul 34342, Turkey

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2004

Quantified Score

Hi-index 0.11

Visualization

Abstract

We propose anti-spare filtering methods for agglutinative languages in general and for Turkish in particular. The methods are dynamic and are based on Artificial Neural Networks (ANN) and Bayesian Networks. The developed algorithms are user-specific and adapt themselves with the characteristics of the incoming e-mails. The algorithms have two main components. The first one deals with the morphology of the words and the second one classifies the e-mails by using the roots of the words extracted by the morphological analysis. Two ANN structures, single layer perceptron and multi-layer perceptron, are considered and the inputs to the networks are determined using binary model and probabilistic model. Similarly, for Bayesian classification, three different approaches are employed: binary model, probabilistic model, and advanced probabilistic model. In the experiments, a total of 750 e-mails (410 spare and 340 normal) were used and a success rate of about 90% was achieved.