Automated learning of decision rules for text categorization
ACM Transactions on Information Systems (TOIS)
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Statistical Language Learning
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
How to Do Everything to Fight Spam, Viruses, Pop-Ups, and Spyware
How to Do Everything to Fight Spam, Viruses, Pop-Ups, and Spyware
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Slamming Spam: A Guide for System Administrators
Slamming Spam: A Guide for System Administrators
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish
Pattern Recognition Letters
An Assessment of Case-Based Reasoning for Spam Filtering
Artificial Intelligence Review
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Morphological Disambiguation of Turkish Text with Perceptron Algorithm
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Developing an immunity to spam
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
Hi-index | 0.10 |
In this paper, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. We develop two models, a class general model and an e-mail specific model, and test the methods under these models. The models are then combined in such a way that the latter one is activated for the cases the first model falls short. Though the approach proposed and the methods developed are general and can be applied to any language, we mainly apply them to Turkish, which is an agglutinative language, and examine some properties of the language. Extensive tests were performed and success rates about 98% for Turkish and 99% for English were obtained. It has been shown that the time complexities can be reduced significantly without sacrificing performance.