Effective self-training for parsing
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
Exploiting bidirectional links: making spamming detection easier
Proceedings of the 18th ACM conference on Information and knowledge management
Detecting spam bots in online social networking sites: a machine learning approach
DBSec'10 Proceedings of the 24th annual IFIP WG 11.3 working conference on Data and applications security and privacy
Harmonic functions based semi-supervised learning for web spam detection
Proceedings of the 2011 ACM Symposium on Applied Computing
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Statistical cross-language Web content quality assessment
Knowledge-Based Systems
Hi-index | 0.02 |
Robust statistical learning based web spam detection system often requires large amounts of labeled training data. However, labeled samples are more difficult, expensive and time consuming to obtain than unlabeled ones. This paper proposed link based semi-supervised learning algorithms to boost the performance of a classifier, which integrates the traditional Self-training with the topological dependency based link learning. The experiments with a few labeled samples on standard WEBSPAM-UK2006 benchmark showed that the algorithms are effective.