A Framework for Collaborative, Content-Based and Demographic Filtering
Artificial Intelligence Review - Special issue on data mining on the Internet
A joint framework for collaborative and content filtering
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A support vector method for multivariate performance measures
ICML '05 Proceedings of the 22nd international conference on Machine learning
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Spam Filtering Using Statistical Data Compression Models
The Journal of Machine Learning Research
Relaxed online SVMs for spam filtering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An evaluation of Naive Bayes variants in content-based learning for spam filtering
Intelligent Data Analysis
A theory of learning with similarity functions
Machine Learning
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Spamalytics: an empirical analysis of spam marketing conversion
Proceedings of the 15th ACM conference on Computer and communications security
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Regression-based latent factor models
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On the relative age of spam and ham training samples for email filtering
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Pairwise preference regression for cold-start recommendation
Proceedings of the third ACM conference on Recommender systems
New filtering approaches for phishing email
Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Spamcraft: an inside look at spam campaign orchestration
LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Re: CAPTCHAs: understanding CAPTCHA-solving services in an economic context
USENIX Security'10 Proceedings of the 19th USENIX conference on Security
Impact of spam exposure on user engagement
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Grindstone4Spam: An optimization toolkit for boosting e-mail classification
Journal of Systems and Software
Personalized email recommender system based on user actions
SEAL'12 Proceedings of the 9th international conference on Simulated Evolution and Learning
Hi-index | 0.00 |
Over the last decade Email Spam has evolved from being just an irritant to users to being truly dangerous. This has led web-mail providers and academic researchers to dedicate considerable resources towards tackling this problem [9, 21, 22, 24, 26]. However, we argue that some aspects of the spam filtering problem are not handled appropriately in existing work. Principal among these are adversarial spammer efforts -- spammers routinely tune their spam emails to bypass spam-filters, and contaminate ground truth via fake HAM/SPAM votes -- and the scale and sparsity of the problem, which essentially precludes learning with a very large set of parameters. In this paper we propose an approach that learns to filter spam by striking a balance between generalizing HAM/SPAM votes across users and emails (to alleviate sparsity) and learning local models for each user (to limit effect of adversarial votes); votes are shared only amongst users and emails that are "similar" to one another. Moreover, we define user-user and email-email similarities using spam-resilient features that are extremely difficult for spammers to fake. We give a methodology that learns to combine multiple features into similarity values while directly optimizing the objective of better spam filtering. A useful side effect of this methodology is that the number of parameters that need to be estimated is very small: this helps us use off-the-shelf learning algorithms to achieve good accuracy while preventing over-training to the adversarial noise in the data. Finally, our approach gives a systematic way to incorporate existing spam-fighting technologies such as IP blacklists, keyword based classifiers, etc into one framework. Experiments on a real-world email dataset show that our approach leads to significant improvements compared to two state-of-the-art baselines.