Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
An empirical comparison of repetitive undersampling techniques
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Improving spamdexing detection via a two-stage classification strategy
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Learning to detect web spam by genetic programming
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Content-based analysis to detect Arabic web spam
Journal of Information Science
Hi-index | 0.01 |
Anti-spam has become one of the top challenges for the Web search. In this paper, we explore the web spam de- tection as a binary classification problem. Based on the fact that reputable pages are more easy to be obtained than spam ones on the Web, an ensemble under-sampling classi- fication strategy is adopted, which exploits the information involved in the large number of reputable websites to full advantage. The strategy is based on the predicted spamic- ity of every sub-classifiers, in which both content-based and link-based features are taken into account. The experiments on standard WEBSPAM-UK2006 benchmark showed that the ensemble strategy can improve the web spam detection performance effectively.