Training on errors experiment to detect fault-prone software modules by spam filter
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification
Computer Networks: The International Journal of Computer and Telecommunications Networking
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator
IEICE - Transactions on Information and Systems
Filtering spams using the minimum description length principle
Proceedings of the 2010 ACM Symposium on Applied Computing
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
Incremental information extraction using tree-based context representations
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Embedding an interpreted language using higher-order functions and types
Journal of Functional Programming
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Can faulty modules be predicted by warning messages of static code analyzer?
Advances in Software Engineering - Special issue on Software Quality Assurance Methodologies and Techniques
Static prediction games for adversarial learning problems
The Journal of Machine Learning Research
Hi-index | 0.00 |
Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus.