A comparison of machine learning techniques for phishing detection

Authors:
Saeed Abu-Nimeh;Dario Nappa;Xinlei Wang;Suku Nair
Affiliations:
Southern Methodist University, Dallas, TX;Southern Methodist University, Dallas, TX;Southern Methodist University, Dallas, TX;Southern Methodist University, Dallas, TX
Venue:
Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
Year:
2007

Citing 10
Cited 23

Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Random Forests

Machine Learning
An evaluation of statistical spam filtering techniques

ACM Transactions on Asian Language Information Processing (TALIP)
Phishing Exposed

Phishing Exposed
Do security toolbars actually prevent phishing attacks?

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Regression Modeling Strategies

Regression Modeling Strategies
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
Learning spam: simple techniques for freely-available software

ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Survey of Text Mining II: Clustering, Classification, and Retrieval

Survey of Text Mining II: Clustering, Classification, and Retrieval

Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Online phishing classification using adversarial data mining and signaling games

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
HumanBoost: Utilization of Users' Past Trust Decision for Identifying Fraudulent Websites

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
New filtering approaches for phishing email

Journal of Computer Security - EU-Funded ICT Research on Trust and Security
Teaching Johnny not to fall for phish

ACM Transactions on Internet Technology (TOIT)
Online phishing classification using adversarial data mining and signaling games

ACM SIGKDD Explorations Newsletter
An evaluation of machine learning-based methods for detection of phishing sites

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Distributed phishing detection by applying variable election using bayesian additive regression trees

ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Lexical feature based phishing URL detection using online learning

Proceedings of the 3rd ACM workshop on Artificial intelligence and security
A hierarchical adaptive probabilistic approach for zero hour phish detection

ESORICS'10 Proceedings of the 15th European conference on Research in computer security
Learning to detect malicious URLs

ACM Transactions on Intelligent Systems and Technology (TIST)
A study of feature subset evaluators and feature subset searching methods for phishing classification

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Identify fixed-path phishing attack by STC

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Evaluating a semisupervised approach to phishing url identification in a realistic scenario

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
The state of phishing attacks

Communications of the ACM
PCA document reconstruction for email classification

Computational Statistics & Data Analysis
Hybrid feature selection for phishing email detection

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
PhorceField: a phish-proof password ceremony

Proceedings of the 27th Annual Computer Security Applications Conference
Clustering potential phishing websites using DeepMD5

LEET'12 Proceedings of the 5th USENIX conference on Large-Scale Exploits and Emergent Threats
Statistical cross-language Web content quality assessment

Knowledge-Based Systems
A multi-tier phishing detection and filtering approach

Journal of Network and Computer Applications
A comparison of machine learning algorithms for proactive hard disk drive failure detection

Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems
Hybrid classification and regression models via particle swarm optimization auto associative neural network based nonlinear PCA

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

There are many applications available for phishing detection. However, unlike predicting spam, there are only few studies that compare machine learning techniques in predicting phishing. The present study compares the predictive accuracy of several machine learning methods including Logistic Regression (LR), Classification and Regression Trees (CART), Bayesian Additive Regression Trees (BART), Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NNet) for predicting phishing emails. A data set of 2889 phishing and legitimate emails is used in the comparative study. In addition, 43 features are used to train and test the classifiers.