Combining winnow and orthogonal sparse bigrams for incremental spam filtering

Authors:
Christian Siefkes;Fidelis Assis;Shalendra Chhabra;William S. Yerazunis
Affiliations:
Freie Universität Berlin, Berlin, Germany;Empresa Brasileira de Telecomunicaçöes - Embratel, Rio de Janeiro, RJ, Brazil;University of California, Riverside, California;Mitsubishi Electric Research Laboratories, Cambridge, MA
Venue:
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2004

Citing 0
Cited 12

Training on errors experiment to detect fault-prone software modules by spam filter

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Spam filtering for short messages

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Email Spam Filtering: A Systematic Review

Foundations and Trends in Information Retrieval
Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification

Computer Networks: The International Journal of Computer and Telecommunications Networking
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

IEICE - Transactions on Information and Systems
Filtering spams using the minimum description length principle

Proceedings of the 2010 ACM Symposium on Applied Computing
Contributions to the study of SMS spam filtering: new collection and results

Proceedings of the 11th ACM symposium on Document engineering
Incremental information extraction using tree-based context representations

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Embedding an interpreted language using higher-order functions and types

Journal of Functional Programming
Facing the spammers: A very effective approach to avoid junk e-mails

Expert Systems with Applications: An International Journal
Can faulty modules be predicted by warning messages of static code analyzer?

Advances in Software Engineering - Special issue on Software Quality Assurance Methodologies and Techniques
Static prediction games for adversarial learning problems

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus.