Identifying spam link generators for monitoring emerging web spam

  • Authors:
  • Young-joo Chung;Masashi Toyoda;Masaru Kitsuregawa

  • Affiliations:
  • The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan;The University of Tokyo, Tokyo, Japan

  • Venue:
  • Proceedings of the 4th workshop on Information credibility
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new spam links are created by them. By monitoring spam link generators, we can quickly find emerging web spam that can be used for updating existing spam filters. In order to classify spam link generators, we investigate various linkbased features including modified PageRank scores based on white and spam seeds, and these scores of neighboring hosts. An online learning algorithm is used to handle large scale data, and the effectiveness of various features is examined. Experiments on three yearly archives of Japanese Web show that we can predict spam link generators with a reasonable performance.