Detecting spam blogs from blog search results

  • Authors:
  • Linhong Zhu;Aixin Sun;Byron Choi

  • Affiliations:
  • School of Computer Engineering, Nanyang Technological University, Singapore;School of Computer Engineering, Nanyang Technological University, Singapore;Department of Computer Science, Hong Kong Baptist University, Hong Kong

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blogging has been an emerging media for people to express themselves. However, the presence of spam blogs (also known as splogs) may reduce the value of blogs and blog search engines. Hence, splog detection has recently attracted much attention from research. Most existing works on splog detection identify splogs using their content/link features and target on spam filters protecting blog search engines' index from spam. In this paper, we propose a splog detection framework by monitoring the on-line search results. The novelty of our splog detection is that our detection capitalizes on the results returned by search engines. The proposed method therefore is particularly useful in detecting those splogs that have successfully slipped through the spam filters that are also actively generating spam-posts. More specifically, our method monitors the top-ranked results of a sequence of temporally-ordered queries and detects splogs based on blogs' temporal behavior. The temporal behavior of a blog is maintained in a blog profile. Given blog profiles, splog detecting functions have been proposed and evaluated using real data collected from a popular blog search engine. Our experiments have demonstrated that splogs could be detected with high accuracy. The proposed method can be implemented on top of any existing blog search engine without intrusion to the latter.