Improving spamdexing detection via a two-stage classification strategy

Authors:
Guang-Gang Geng;Chun-Heng Wang;Qiu-Dan Li
Affiliations:
Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China;Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China;Key Laboratory of Complex System and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R. China
Venue:
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Year:
2008

Citing 9
Cited 1

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Challenges in web search engines

ACM SIGIR Forum
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Improving web spam classifiers using link structure

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam detection via commercial intent analysis

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Boosting the Performance of Web Spam Detection with Ensemble Under-Sampling Classification

FSKD '07 Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 04
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Evaluating Arabic spam classifiers using link analysis

Proceedings of the 3rd International Conference on Information and Communication Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spamdexing is any of various methods to manipulate the relevancy or prominence of resources indexed by a search engine, usually in a manner inconsistent with the purpose of the indexing system. Combating Spamdexing has become one of the top challenges for web search. Machine learning based methods have shown their superiority for being easy to adapt to newly developed spam techniques. In this paper, we propose a two-stage classification strategy to detect web spam, which is based on the predicted spamicity of learning algorithms and hyperlink propagation. Preliminary experiments on standard WEBSPAM- UK2006 benchmark show that the two-stage strategy is reasonable and effective.