Improving web spam classifiers using link structure

  • Authors:
  • Qingqing Gan;Torsten Suel

  • Affiliations:
  • Polytechnic University, Brooklyn, NY;Polytechnic University, Brooklyn, NY

  • Venue:
  • AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.