Spam email filtering using network-level properties

Authors:
Paulo Cortez;André Correia;Pedro Sousa;Miguel Rocha;Miguel Rio
Affiliations:
Dep. of Information Systems, Algoritmi, University of Minho, Guimarães, Portugal;Dep. of Information Systems, Algoritmi, University of Minho, Guimarães, Portugal;Department of Electronic and Electrical Engineering, University College London, London, UK;Department of Electronic and Electrical Engineering, University College London, London, UK;Dep. of Informatics, University of Minho, Braga, Portugal
Venue:
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Year:
2010

Citing 11
Cited 1

Support-Vector Networks

Machine Learning
Practical selection of SVM parameters and noise estimation for SVM regression

Neural Networks
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Understanding the network-level behavior of spammers

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Exploiting machine learning to subvert your spam filter

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Practical Text Mining with Perl

Practical Text Mining with Perl
A survey of learning-based techniques of email spam filtering

Artificial Intelligence Review
Modeling wine preferences by data mining from physicochemical properties

Decision Support Systems
Symbiotic Data Mining for Personalized Spam Filtering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Support vector machines for spam categorization

IEEE Transactions on Neural Networks

Blocking spam by separating end-user machines from legitimate mail server machines

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spam is serious problem that affects email users (e.g. phishing attacks, viruses and time spent reading unwanted messages). We propose a novel spam email filtering approach based on network-level attributes (e.g. the IP sender geographic coordinates) that are more persistent in time when compared to message content. This approach was tested using two classifiers, Naive Bayes (NB) and Support Vector Machines (SVM), and compared against bag-of-words models and eight blacklists. Several experiments were held with recent collected legitimate (ham) and non legitimate (spam) messages, in order to simulate distinct user profiles from two countries (USA and Portugal). Overall, the network-level based SVM model achieved the best discriminatory performance. Moreover, preliminary results suggests that such method is more robust to phishing attacks.