Improving spamdexing detection via a two-stage classification strategy

  • Authors:
  • Guang-Gang Geng;Chun-Heng Wang;Qiu-Dan Li

  • Affiliations:
  • Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China;Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China;Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China

  • Venue:
  • AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Spamdexing is any of various methods to manipulate the relevancy or prominence of resources indexed by a search engine, usually in a manner inconsistent with the purpose of the indexing system. Combating Spamdexing has become one of the top challenges for web search. Machine learning based methods have shown their superiority for being easy to adapt to newly developed spam techniques. In this paper, we propose a two-stage classification strategy to detect web spam, which is based on the predicted spamicity of learning algorithms and hyperlink propagation. Preliminary experiments on standard WEBSPAM- UK2006 benchmark show that the two-stage strategy is reasonable and effective.