An evaluation of machine learning-based methods for detection of phishing sites

Authors:
Daisuke Miyamoto;Hiroaki Hazeyama;Youki Kadobayashi
Affiliations:
Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan;Nara Institute of Science and Technology, Ikoma, Nara, Japan
Venue:
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Year:
2008

Citing 5
Cited 2

Characteristics and responsibilities involved in a Phishing attack

WISICT '05 Proceedings of the 4th international symposium on Information and communication technologies
Anomaly Based Web Phishing Page Detection

ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
A comparison of machine learning techniques for phishing detection

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit

Lexical feature based phishing URL detection using online learning

Proceedings of the 3rd ACM workshop on Artificial intelligence and security
A multi-tier phishing detection and filtering approach

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present the performance of machine learning-based methods for detection of phishing sites. We employ 9 machine learning techniques including AdaBoost, Bagging, Support Vector Machines, Classification and Regression Trees, Logistic Regression, Random Forests, Neural Networks, Naive Bayes, and Bayesian Additive Regression Trees. We let these machine learning techniques combine heuristics, and also let machine learning-based detection methods distinguish phishing sites from others. We analyze our dataset, which is composed of 1,500 phishing sites and 1,500 legitimate sites, classify them using the machine learning-based detection methods, and measure the performance. In our evaluation, we used f1 measure, error rate, and Area Under the ROC Curve (AUC) as performance metrics along with our requirements for detection methods. The highest f1 measure is 0.8581, the lowest error rate is 14.15%, and the highest AUC is 0.9342, all of which are observed in the case of AdaBoost. We also observe that 7 out of 9 machine learning-based detection methods outperform the traditional detection method.