Foundations of statistical natural language processing
Foundations of statistical natural language processing
Statistical Language Learning
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish
Pattern Recognition Letters
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A case-based technique for tracking concept drift in spam filtering
Knowledge-Based Systems
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
In this paper, we propose methods and heuristics having high accuracies and low time complexities for filtering spam e-mails. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics is devised. Though the main concern of the research is studying the applicability of these methods on Turkish e-mails, they were also applied to English e-mails. A data set for both languages was compiled. Extensive tests were performed with different parameters. Success rates of about 97% for Turkish e-mails and above 98% for English e-mails were obtained. In addition, it has been shown that the time complexities can be reduced significantly without sacrificing from success.