Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Combining winnow and orthogonal sparse bigrams for incremental spam filtering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Online supervised spam filter evaluation
ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
The minimum description length principle in coding and modeling
IEEE Transactions on Information Theory
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
Detection of near-duplicate user generated contents: the SMS spam collection
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Spam has become an increasingly important problem with a big economic impact in society. Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. In this paper, we present a novel approach to spam filtering based on the minimum description length principle. The proposed model is fast to construct and incrementally updateable. Additionally, we offer an analysis concerning the measurements usually employed to evaluate the quality of the anti-spam classifiers. In this sense, we present a new measure in order to provide a fairer comparison. Furthermore, we conducted an empirical experiment using six well-known, large and public databases. Finally, the results indicate that our approach outperforms the state-of-the-art spam filters.