Malicious web content detection by machine learning

Authors:
Yung-Tsung Hou;Yimeng Chang;Tsuhan Chen;Chi-Sung Laih;Chia-Mei Chen
Affiliations:
National Sun Yat-Sen University, Kaohsiung, Taiwan;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;National Cheng Kung University, Tainan, Taiwan;National Sun Yat-Sen University, Kaohsiung, Taiwan
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 10
Cited 8

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
C4.5: programs for machine learning

C4.5: programs for machine learning
On Relevance, Probabilistic Indexing and Information Retrieval

Journal of the ACM (JACM)
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Attacking Malicious Code: A Report to the Infosec Research Council

IEEE Software
Testing malware detectors

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Learning to detect malicious executables in the wild

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Semantics-Aware Malware Detection

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
The ghost in the browser analysis of web-based malware

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Detecting malicious code by model checking

DIMVA'05 Proceedings of the Second international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Detecting malicious web links and identifying their attack types

WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
Feature evaluation for web crawler detection with data mining techniques

Expert Systems with Applications: An International Journal
Cross-layer detection of malicious websites

Proceedings of the third ACM conference on Data and application security and privacy
Identification of potential malicious web pages

AISC '11 Proceedings of the Ninth Australasian Information Security Conference - Volume 116
Effective analysis, characterization, and detection of malicious web pages

Proceedings of the 22nd international conference on World Wide Web companion
Delta: automatic identification of unknown web-based infection campaigns

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Anatomy of drive-by download attack

AISC '13 Proceedings of the Eleventh Australasian Information Security Conference - Volume 138
Dendroid: A text mining approach to analyzing and classifying code structures in Android malware families

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.