Link based small sample learning for web spam detection

Authors:
Guang-Gang Geng;Qiudan Li;Xinchang Zhang
Affiliations:
Computer Network Information Center,Chinese Academy of Sciences, Beijing, China;Institute of Automation,Chinese Academy of Sciences, Beijing, China;Computer Network Information Center,Chinese Academy of Sciences, Beijing, China
Venue:
Proceedings of the 18th international conference on World wide web
Year:
2009

Citing 3
Cited 5

Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web spam detection with re-extracted features

Proceedings of the 17th international conference on World Wide Web

Exploiting bidirectional links: making spamming detection easier

Proceedings of the 18th ACM conference on Information and knowledge management
Detecting spam bots in online social networking sites: a machine learning approach

DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
Harmonic functions based semi-supervised learning for web spam detection

Proceedings of the 2011 ACM Symposium on Applied Computing
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Statistical cross-language Web content quality assessment

Knowledge-Based Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Robust statistical learning based web spam detection system often requires large amounts of labeled training data. However, labeled samples are more difficult, expensive and time consuming to obtain than unlabeled ones. This paper proposed link based semi-supervised learning algorithms to boost the performance of a classifier, which integrates the traditional Self-training with the topological dependency based link learning. The experiments with a few labeled samples on standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.