Detecting splogs using similarities of splog HTML structures

  • Authors:
  • Taichi Katayama;Takayuki Yoshinaka;Takehito Utsuro;Yasuhide Kawada;Tomohiro Fukuhara

  • Affiliations:
  • University of Tsukuba, Tsukuba, Japan;Tokyo Denki University, Tokyo, Japan;University of Tsukuba, Tsukuba, Japan;Navix Co., Ltd., Tokyo, Japan;University of Tokyo, Kashiwa, Japan

  • Venue:
  • Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spam blogs or splogs are blogs hosting spam posts, created using machine generated or hijacked content for the sole purpose of hosting advertisements or increasing the number of inlinks of target sites. Among those splogs, this paper focuses on detecting a group of splogs which are estimated to be created by an identical spammer. We especially show that similarities of html structures among those splogs created by an identical spammer contribute to improving the performance of splog detection. In measuring similarities of html structures, we extract a list of blocks (minimum unit of content) from the DOM tree of a html file. We show that the html files of splogs estimated to be created by an identical spammer tend to have similar DOM trees and this tendency is quite effective in splog detection.