The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Detecting Link Spam Using Temporal Information
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A large-scale study of link spam detection by graph algorithms
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Online Passive-Aggressive Algorithms
The Journal of Machine Learning Research
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
A study of link farm distribution and evolution using a time series of web snapshots
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Identifying suspicious URLs: an application of large-scale online learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Trustworthiness analysis of web search results
ECDL'07 Proceedings of the 11th European conference on Research and Advanced Technology for Digital Libraries
Detecting malicious web links and identifying their attack types
WebApps'11 Proceedings of the 2nd USENIX conference on Web application development
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new spam links are created by them. By monitoring spam link generators, we can quickly find emerging web spam that can be used for updating existing spam filters. In order to classify spam link generators, we investigate various linkbased features including modified PageRank scores based on white and spam seeds, and these scores of neighboring hosts. An online learning algorithm is used to handle large scale data, and the effectiveness of various features is examined. Experiments on three yearly archives of Japanese Web show that we can predict spam link generators with a reasonable performance.