Web Spam Detection by Exploring Densely Connected Subgraphs

  • Authors:
  • Yutaka I. Leon-Suematsu;Kentaro Inui;Sadao Kurohashi;Yutaka Kidawara

  • Affiliations:
  • -;-;-;-

  • Venue:
  • WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a Web spam detection algorithm that relies on link analysis. The method consists of three steps: (1) decomposition of web graphs in densely connected sub graphs and calculation of the features for each sub graph, (2) use of SVM classifiers to identify sub graphs composed of Web spam, and (3) propagation of predictions over web graphs by a biased Page Rank algorithm to expand the scope of identification. We performed experiments on a public benchmark. An empirical study of the core structure of web graphs suggests that highly ranked non-spam hosts can be identified by viewing the coreness of the web graph elements.