Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Authors:
Justin Ma;Lawrence K. Saul;Stefan Savage;Geoffrey M. Voelker
Affiliations:
UC San Diego, La Jolla, CA, USA;UC San Diego, La Jolla, CA, USA;UC San Diego, La Jolla, CA, USA;UC San Diego, La Jolla, CA, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 13
Cited 42

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Cantina: a content-based approach to detecting phishing web sites

Proceedings of the 16th international conference on World Wide Web
Learning to detect phishing emails

Proceedings of the 16th international conference on World Wide Web
A comparison of machine learning techniques for phishing detection

Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
A framework for detection and measurement of phishing attacks

Proceedings of the 2007 ACM workshop on Recurring malcode
SpyProxy: execution-based detection of malicious web content

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Spamscatter: characterizing internet scam hosting infrastructure

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Behind phishing: an examination of phisher modi operandi

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
All your iFRAMEs point to Us

SS'08 Proceedings of the 17th conference on Security symposium
Highly predictive blacklisting

SS'08 Proceedings of the 17th conference on Security symposium
Identifying suspicious URLs: an application of large-scale online learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Multiplicative updates for L1-regularized linear and logistic regression

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis

Identifying suspicious URLs: an application of large-scale online learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Phishnet: predictive blacklisting to detect phishing attacks

INFOCOM'10 Proceedings of the 29th conference on Information communications
Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
On the potential of proactive domain blacklisting

LEET'10 Proceedings of the 3rd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Lexical feature based phishing URL detection using online learning

Proceedings of the 3rd ACM workshop on Artificial intelligence and security
Detecting algorithmically generated malicious domain names

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Let web spammers expose themselves

Proceedings of the fourth ACM international conference on Web search and data mining
Learning to detect malicious URLs

ACM Transactions on Intelligent Systems and Technology (TIST)
Prophiler: a fast filter for the large-scale detection of malicious web pages

Proceedings of the 20th international conference on World wide web
Adversarial Web Search

Foundations and Trends in Information Retrieval
On the effects of registrar-level intervention

LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats
Detecting malicious web links and identifying their attack types

WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
SUT: Quantifying and mitigating URL typosquatting

Computer Networks: The International Journal of Computer and Telecommunications Networking
Detecting bots via incremental LS-SVM learning with dynamic feature adaptation

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
deSEO: combating search-result poisoning

SEC'11 Proceedings of the 20th USENIX conference on Security
Spam detection using web page content: a new battleground

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Judging a site by its content: learning the textual, structural, and visual features of malicious web pages

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Identifying botnets by capturing group activities in DNS traffic

Computer Networks: The International Journal of Computer and Telecommunications Networking
Clustering potential phishing websites using DeepMD5

LEET'12 Proceedings of the 5th USENIX conference on Large-Scale Exploits and Emergent Threats
Trustworthiness testing of phishing websites: A behavior model-based approach

Future Generation Computer Systems
Reducing the window of opportunity for Android malware Gotta catch 'em all

Journal in Computer Virology
PKI as part of an integrated risk management strategy for web security

EuroPKI'11 Proceedings of the 8th European conference on Public Key Infrastructures, Services, and Applications
Efficient and scalable socware detection in online social networks

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Feature selection for improved phishing detection

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Statistical cross-language Web content quality assessment

Knowledge-Based Systems
Context-aware web security threat prevention

Proceedings of the 2012 ACM conference on Computer and communications security
Detecting algorithmically generated domain-flux attacks with DNS traffic analysis

IEEE/ACM Transactions on Networking (TON)
Fluxing botnet command and control channels with URL shortening services

Computer Communications
Cross-layer detection of malicious websites

Proceedings of the third ACM conference on Data and application security and privacy
Malicious automatically generated domain name detection using Stateful-SBB

EvoApplications'13 Proceedings of the 16th European conference on Applications of Evolutionary Computation
Malicious URL Detection Based on Kolmogorov Complexity Estimation

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Identification of potential malicious web pages

AISC '11 Proceedings of the Ninth Australasian Information Security Conference - Volume 116
PhishLive: a view of phishing and malware attacks from an edge router

PAM'13 Proceedings of the 14th international conference on Passive and Active Measurement
Towards preventing QR code based attacks on android phone using security warnings

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Cost-sensitive online active learning with application to malicious URL detection

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Effective analysis, characterization, and detection of malicious web pages

Proceedings of the 22nd international conference on World Wide Web companion
Analyzing and defending against web-based malware

ACM Computing Surveys (CSUR)
Shady paths: leveraging surfing crowds to detect malicious web pages

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Delta: automatic identification of unknown web-based infection campaigns

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks

Proceedings of the 29th Annual Computer Security Applications Conference
Anatomy of drive-by download attack

AISC '13 Proceedings of the Eleventh Australasian Information Security Conference - Volume 138
Efficient and effective realtime prediction of drive-by download attacks

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. The resulting classifiers obtain 95-99% accuracy, detecting large numbers of malicious Web sites from their URLs, with only modest false positives.