Prophiler: a fast filter for the large-scale detection of malicious web pages

Authors:
Davide Canali;Marco Cova;Giovanni Vigna;Christopher Kruegel
Affiliations:
Institute Eurecom, Sophia Antipolis, France;University of Birmingham, Birmingham, United Kingdom;University of California, Santa Barbara, Santa Barbara, CA, USA;University of California, Santa Barbara, Santa Barbara, CA, USA
Venue:
Proceedings of the 20th international conference on World wide web
Year:
2011

Citing 10
Cited 23

A framework for detection and measurement of phishing attacks

Proceedings of the 2007 ACM workshop on Recurring malcode
All your iFRAMEs point to Us

SS'08 Proceedings of the 17th conference on Security symposium
Beyond blacklists: learning to detect malicious web sites from suspicious URLs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Your botnet is my botnet: analysis of a botnet takeover

Proceedings of the 16th ACM conference on Computer and communications security
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Detection and analysis of drive-by-download attacks and malicious JavaScript code

Proceedings of the 19th international conference on World wide web
Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
PhoneyC: a virtual client honeypot

LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
NOZZLE: a defense against heap-spraying code injection attacks

SSYM'09 Proceedings of the 18th conference on USENIX security symposium
Cujo: efficient detection and prevention of drive-by-download attacks

Proceedings of the 26th Annual Computer Security Applications Conference

ZOZZLE: fast and precise in-browser JavaScript malware detection

SEC'11 Proceedings of the 20th USENIX conference on Security
ZDVUE: prioritization of javascript attacks to discover new vulnerabilities

Proceedings of the 4th ACM workshop on Security and artificial intelligence
Static detection of malicious JavaScript-bearing PDF documents

Proceedings of the 27th Annual Computer Security Applications Conference
PKI as part of an integrated risk management strategy for web security

EuroPKI'11 Proceedings of the 8th European conference on Public Key Infrastructures, Services, and Applications
Tracking the trackers: fast and scalable dynamic analysis of web content for privacy violations

ACNS'12 Proceedings of the 10th international conference on Applied Cryptography and Network Security
Early detection of malicious behavior in JavaScript code

Proceedings of the 5th ACM workshop on Security and artificial intelligence
Autonomous learning for detection of JavaScript attacks: vision or reality?

Proceedings of the 5th ACM workshop on Security and artificial intelligence
An approach for identifying JavaScript-loaded advertisements through static program analysis

Proceedings of the 2012 ACM workshop on Privacy in the electronic society
Context-aware web security threat prevention

Proceedings of the 2012 ACM conference on Computer and communications security
A practical approach on clustering malicious PDF documents

Journal in Computer Virology
Fluxing botnet command and control channels with URL shortening services

Computer Communications
JStill: mostly static detection of obfuscated malicious JavaScript code

Proceedings of the third ACM conference on Data and application security and privacy
Cross-layer detection of malicious websites

Proceedings of the third ACM conference on Data and application security and privacy
A measurement study of insecure javascript practices on the web

ACM Transactions on the Web (TWEB)
Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Effective analysis, characterization, and detection of malicious web pages

Proceedings of the 22nd international conference on World Wide Web companion
Understanding and overcoming cyber security anti-patterns

Computer Networks: The International Journal of Computer and Telecommunications Networking
Delta: automatic identification of unknown web-based infection campaigns

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Anatomy of drive-by download attack

AISC '13 Proceedings of the Eleventh Australasian Information Security Conference - Volume 138
Weaknesses in defenses against web-borne malware

DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Revolver: an automated approach to the detection of evasiveweb-based malware

SEC'13 Proceedings of the 22nd USENIX conference on Security
Using clone detection to find malware in acrobat files

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Efficient and effective realtime prediction of drive-by download attacks

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive. One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets. To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.