Developing methods and heuristics with low time complexities for filtering spam messages

Authors:
Tunga Güngör;Ali Çiltik
Affiliations:
Boğaziçi University, Computer Engineering Department, Istanbul, Turkey;Boğaziçi University, Computer Engineering Department, Istanbul, Turkey
Venue:
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Year:
2007

Citing 7
Cited 0

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Statistical Language Learning

Statistical Language Learning
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish

Pattern Recognition Letters
A comparison of event models for Naive Bayes anti-spam e-mail filtering

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A case-based technique for tracking concept drift in spam filtering

Knowledge-Based Systems
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose methods and heuristics having high accuracies and low time complexities for filtering spam e-mails. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics is devised. Though the main concern of the research is studying the applicability of these methods on Turkish e-mails, they were also applied to English e-mails. A data set for both languages was compiled. Extensive tests were performed with different parameters. Success rates of about 97% for Turkish e-mails and above 98% for English e-mails were obtained. In addition, it has been shown that the time complexities can be reduced significantly without sacrificing from success.