Learning to detect malicious URLs

Authors:
Justin Ma;Lawrence K. Saul;Stefan Savage;Geoffrey M. Voelker
Affiliations:
University of California, Berkeley;University of California, San Diego;University of California, San Diego;University of California, San Diego
Venue:
ACM Transactions on Intelligent Systems and Technology (TIST)
Year:
2011

Citing 17
Cited 2

On-line learning and stochastic approximations

On-line learning in neural networks
Cost-Sensitive Learning by Cost-Proportionate Example Weighting

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Fast webpage classification using URL features

Proceedings of the 14th ACM international conference on Information and knowledge management
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
Online Passive-Aggressive Algorithms

The Journal of Machine Learning Research
A comparison of machine learning techniques for phishing detection

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
A framework for detection and measurement of phishing attacks

Proceedings of the 2007 ACM workshop on Recurring malcode
The Forgetron: A Kernel-Based Perceptron on a Budget

SIAM Journal on Computing
SpyProxy: execution-based detection of malicious web content

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Behind phishing: an examination of phisher modi operandi

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
The projectron: a bounded kernel-based Perceptron

Proceedings of the 25th international conference on Machine learning
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
All your iFRAMEs point to Us

SS'08 Proceedings of the 17th conference on Security symposium
Identifying suspicious URLs: an application of large-scale online learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Cost-sensitive online active learning with application to malicious URL detection

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Anatomy of drive-by download attack

AISC '13 Proceedings of the Eleventh Australasian Information Security Conference - Volume 138

Quantified Score

Hi-index	0.00

Visualization

Abstract

Malicious Web sites are a cornerstone of Internet criminal activities. The dangers of these sites have created a demand for safeguards that protect end-users from visiting them. This article explores how to detect malicious Web sites from the lexical and host-based features of their URLs. We show that this problem lends itself naturally to modern algorithms for online learning. Online algorithms not only process large numbers of URLs more efficiently than batch algorithms, they also adapt more quickly to new features in the continuously evolving distribution of malicious URLs. We develop a real-time system for gathering URL features and pair it with a real-time feed of labeled URLs from a large Web mail provider. From these features and labels, we are able to train an online classifier that detects malicious Web sites with 99% accuracy over a balanced dataset.