On the limited memory BFGS method for large scale optimization
Mathematical Programming: Series A and B
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
The nature of statistical learning theory
The nature of statistical learning theory
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning
A maximum entropy approach to natural language processing
Computational Linguistics
Inducing Features of Random Fields
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximum entropy models for natural language ambiguity resolution
Maximum entropy models for natural language ambiguity resolution
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Boosting trees for clause splitting
ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
A suffix tree approach to anti-spam email filtering
Machine Learning
Web-based text classification in the absence of manually labeled training documents
Journal of the American Society for Information Science and Technology
Online supervised spam filter evaluation
ACM Transactions on Information Systems (TOIS)
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
The Journal of Machine Learning Research
A comparison of machine learning techniques for phishing detection
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Time-efficient spam e-mail filtering using n-gram models
Pattern Recognition Letters
Detecting spam email by radial basis function networks
International Journal of Knowledge-based and Intelligent Engineering Systems
Effective spam filtering: A single-class learning and ensemble approach
Decision Support Systems
Searching for Interacting Features for Spam Filtering
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Evaluation of spam detection and prevention frameworks for email and image spam: a state of art
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
An Operable Email Based Intelligent Personal Assistant
World Wide Web
Review: A review of machine learning approaches to Spam filtering
Expert Systems with Applications: An International Journal
ECUE: A Spam Filter that Uses Machine Learning to Track Concept Drift
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A discrete mixture-based kernel for SVMs: Application to spam and image categorization
Information Processing and Management: an International Journal
A survey of learning-based techniques of email spam filtering
Artificial Intelligence Review
Study on Ensemble Classification Methods towards Spam Filtering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A simple yet effective spam blocking method
Proceedings of the 2nd international conference on Security of information and networks
Vlogging: A survey of videoblogging technology on the web
ACM Computing Surveys (CSUR)
Filtering spams using the minimum description length principle
Proceedings of the 2010 ACM Symposium on Applied Computing
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
A neural tree and its application to spam e-mail detection
Expert Systems with Applications: An International Journal
Cuisine: Classification using stylistic feature sets and-or name-based feature sets
Journal of the American Society for Information Science and Technology
Word co-occurrence features for text classification
Information Systems
Anomaly Detection in Dynamic Systems Using Weak Estimators
ACM Transactions on Internet Technology (TOIT)
Detecting bots via incremental LS-SVM learning with dynamic feature adaptation
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Collective classification for spam filtering
CISIS'11 Proceedings of the 4th international conference on Computational intelligence in security for information systems
Enhancing scalability in anomaly-based email spam filtering
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
Enhanced Topic-based Vector Space Model for semantics-aware spam filtering
Expert Systems with Applications: An International Journal
Privacy protected knowledge management in services with emphasis on quality data
Proceedings of the 20th ACM international conference on Information and knowledge management
Application and evaluation of bayesian filter for chinese spam
Inscrypt'06 Proceedings of the Second SKLOIS conference on Information Security and Cryptology
A survey of emerging approaches to spam filtering
ACM Computing Surveys (CSUR)
A neural model in anti-spam systems
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II
An immunological filter for spam
ICARIS'06 Proceedings of the 5th international conference on Artificial Immune Systems
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
NASC: a novel approach for spam classification
ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Generating estimates of classification confidence for a case-based spam filter
ICCBR'05 Proceedings of the 6th international conference on Case-Based Reasoning Research and Development
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Behaviour-Based web spambot detection by utilising action time and action frequency
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Review: SMS spam filtering: Methods and data
Expert Systems with Applications: An International Journal
Active online classification via information maximization
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Word sense disambiguation for spam filtering
Electronic Commerce Research and Applications
Statistical cross-language Web content quality assessment
Knowledge-Based Systems
Longtime behavior of harvesting spam bots
Proceedings of the 2012 ACM conference on Internet measurement conference
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis
International Journal of Information Security and Privacy
Which work-item updates need your response?
Proceedings of the 10th Working Conference on Mining Software Repositories
Reversing the effects of tokenisation attacks against content-based spam filters
International Journal of Security and Networks
Character usage in Chinese short message service SMS: a real-world study in Mainland China
International Journal of Mobile Communications
Genetic optimized artificial immune system in spam detection: a review and a model
Artificial Intelligence Review
Hybrid email spam detection model with negative selection algorithm and differential evolution
Engineering Applications of Artificial Intelligence
Feature identification for topical relevance assessment in feed search engines
Intelligent Data Analysis
Learning to filter spam emails: An ensemble learning approach
International Journal of Hybrid Intelligent Systems
Hi-index | 0.01 |
This paper evaluates five supervised learning methods in the context of statistical spam filtering. We study the impact of different feature pruning methods and feature set sizes on each learner's performance using cost-sensitive measures. It is observed that the significance of feature selection varies greatly from classifier to classifier. In particular, we found support vector machine, AdaBoost, and maximum entropy model are top performers in this evaluation, sharing similar characteristics: not sensitive to feature selection strategy, easily scalable to very high feature dimension, and good performances across different datasets. In contrast, naive Bayes, a commonly used classifier in spam filtering, is found to be sensitive to feature selection methods on small feature set, and fails to function well in scenarios where false positives are penalized heavily. The experiments also suggest that aggressive feature pruning should be avoided when building filters to be used in applications where legitimate mails are assigned a cost much higher than spams (such as λ = 999), so as to maintain a better-than-baseline performance. An interesting finding is the effect of mail headers on spam filtering, which is often ignored in previous studies. Experiments show that classifiers using features from message header alone can achieve comparable or better performance than filters utilizing body features only. This implies that message headers can be reliable and powerfully discriminative feature sources for spam filtering.