Malicious web content detection by machine learning

  • Authors:
  • Yung-Tsung Hou;Yimeng Chang;Tsuhan Chen;Chi-Sung Laih;Chia-Mei Chen

  • Affiliations:
  • National Sun Yat-Sen University, Kaohsiung, Taiwan;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;National Cheng Kung University, Tainan, Taiwan;National Sun Yat-Sen University, Kaohsiung, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

The recent development of the dynamic HTML gives attackers a new and powerful technique to compromise computer systems. A malicious dynamic HTML code is usually embedded in a normal webpage. The malicious webpage infects the victim when a user browses it. Furthermore, such DHTML code can disguise itself easily through obfuscation or transformation, which makes the detection even harder. Anti-virus software packages commonly use signature-based approaches which might not be able to efficiently identify camouflaged malicious HTML codes. Therefore, our paper proposes a malicious web page detection using the technique of machine learning. Our study analyzes the characteristic of a malicious webpage systematically and presents important features for machine learning. Experimental results demonstrate that our method is resilient to code obfuscations and can correctly determine whether a webpage is malicious or not.