"In vivo" spam filtering: a challenge problem for KDD
ACM SIGKDD Explorations Newsletter
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Lazy Associative Classification for Content-based Spam Detection
LA-WEB '06 Proceedings of the Fourth Latin American Web Congress
Lazy Associative Classification
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Spam and the ongoing battle for the inbox
Communications of the ACM - Spam and the ongoing battle for the inbox
Learning to detect phishing emails
Proceedings of the 16th international conference on World Wide Web
Spamscatter: characterizing internet scam hosting infrastructure
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Email Spam Filtering: A Systematic Review
Foundations and Trends in Information Retrieval
Calibrated lazy associative classification
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Beyond blacklists: learning to detect malicious web sites from suspicious URLs
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Ensembles in adversarial classification for spam
Proceedings of the 18th ACM conference on Information and knowledge management
Click Trajectories: End-to-End Analysis of the Spam Value Chain
SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
Design and Evaluation of a Real-Time URL Spam Filtering Service
SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
Support vector machines for spam categorization
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Traditional content-based e-mail spam filtering takes into account content of e-mail messages and apply machine learning techniques to infer patterns that discriminate spams from hams. In particular, the use of content-based spam filtering unleashed an unending arms race between spammers and filter developers, given the spammers' ability to continuously change spam message content in ways that might circumvent the current filters. In this paper, we propose to expand the horizons of content-based filters by taking into consideration the content of the Web pages linked by e-mail messages. We describe a methodology for extracting pages linked by URLs in spam messages and we characterize the relationship between those pages and the messages. We then use a machine learning technique (a lazy associative classifier) to extract classification rules from the web pages that are relevant to spam detection. We demonstrate that the use of information from linked pages can nicely complement current spam classification techniques, as portrayed by SpamAssassin. Our study shows that the pages linked by spams are a very promising battleground.