Data compression using dynamic Markov modelling
The Computer Journal
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Evaluating cost-sensitive Unsolicited Bulk Email categorization
Proceedings of the 2002 ACM symposium on Applied computing
Information Retrieval
A statistical approach to the spam problem
Linux Journal
A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists
Information Retrieval
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pricing via Processing or Combatting Junk Mail
CRYPTO '92 Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology
Using Character Recognition and Segmentation to Tell Computer from Humans
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Tree induction vs. logistic regression: a learning-curve analysis
The Journal of Machine Learning Research
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An evaluation of statistical spam filtering techniques
ACM Transactions on Asian Language Information Processing (TALIP)
Combining winnow and orthogonal sparse bigrams for incremental spam filtering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Designing human friendly human interaction proofs (HIPs)
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A comparison of event models for Naive Bayes anti-spam e-mail filtering
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Fighting Spam with Reputation Systems
Queue - Social Computing
Image Analysis for Efficient Categorization of Image-based Spam E-mail
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
An Assessment of Case-Based Reasoning for Spam Filtering
Artificial Intelligence Review
Spam Detection Using Text Clustering
CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Combining text and heuristics for cost-sensitive spam filtering
ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Compression and Machine Learning: A New Perspective on Feature Space Vectors
DCC '06 Proceedings of the Data Compression Conference
Exploiting structural information for semi-structured document categorization
Information Processing and Management: an International Journal
Peer-to-peer collaborative spam detection
Crossroads
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Statistical precision of information retrieval evaluation
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Content based SMS spam filtering
Proceedings of the 2006 ACM symposium on Document engineering
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing)
Artificial immune system inspired behavior-based anti-spam filter
Soft Computing - A Fusion of Foundations, Methodologies and Applications - Web intelligence and change discovery
Online supervised spam filter evaluation
ACM Transactions on Information Systems (TOIS)
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Spam Filtering Based On The Analysis Of Text Information Embedded Into Images
The Journal of Machine Learning Research
Detecting spam in VoIP networks
SRUTI'05 Proceedings of the Steps to Reducing Unwanted Traffic on the Internet on Steps to Reducing Unwanted Traffic on the Internet Workshop
Discriminative learning for differing training and test distributions
Proceedings of the 24th international conference on Machine learning
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Feature engineering for mobile (SMS) spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Time-efficient spam e-mail filtering using n-gram models
Pattern Recognition Letters
Spam filtering for short messages
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Spam Mail Reduces Economic Effects
ICDS '08 Proceedings of the Second International Conference on Digital Society
Lexicon randomization for near-duplicate detection with I-Match
The Journal of Supercomputing
Dynamically weighted hidden Markov model for spam deobfuscation
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
An anti-spam scheme using pre-challenges
Computer Communications
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Vipul's Razor: The mechanics of Vipul's Razor technology
Network Security
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Genre-based decomposition of email class noise
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Study on Ensemble Classification Methods towards Spam Filtering
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Probabilistic anti-spam filtering with dimensionality reduction
Proceedings of the 2010 ACM Symposium on Applied Computing
Filtering spams using the minimum description length principle
Proceedings of the 2010 ACM Symposium on Applied Computing
Uncovering social spammers: social honeypots + machine learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Multi-field learning for email spam filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Advantages and vulnerabilities of pull-based email-delivery
AISC '10 Proceedings of the Eighth Australasian Conference on Information Security - Volume 105
Cooperative anti-spam system based on multilayer agents
Proceedings of the 20th international conference companion on World wide web
Foundations and Trends in Information Retrieval
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Spam detection using web page content: a new battleground
Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Contributions to the study of SMS spam filtering: new collection and results
Proceedings of the 11th ACM symposium on Document engineering
Comment spam detection by sequence mining
Proceedings of the fifth ACM international conference on Web search and data mining
Facing the spammers: A very effective approach to avoid junk e-mails
Expert Systems with Applications: An International Journal
Impact of spam exposure on user engagement
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Diversionary comments under political blog posts
Proceedings of the 21st ACM international conference on Information and knowledge management
Crime scene investigation: SMS spam data analysis
Proceedings of the 2012 ACM conference on Internet measurement conference
FIMESS: filtering mobile external SMS spam
Proceedings of the 6th Balkan Conference in Informatics
Survey and taxonomy of botnet research through life-cycle
ACM Computing Surveys (CSUR)
Bayesian mixed-effects inference on classification performance in hierarchical data sets
The Journal of Machine Learning Research
TorteMail: solving email information overload
Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration
Campaign extraction from social media
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web Mining
Hi-index | 0.01 |
Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than "I know it when I see it." Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam? We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media — such as instant messaging and the Web — are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.