Challenges in web search engines
ACM SIGIR Forum
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
MailRank: using ranking for spam detection
Proceedings of the 14th ACM international conference on Information and knowledge management
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Extracting link spam using biased random walks from spam seed sets
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
DiffusionRank: a possible penicillin for web spamming
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Larger is better: seed selection in link-based anti-spamming algorithms
Proceedings of the 17th international conference on World Wide Web
BrowseRank: letting web users vote for page importance
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Automatic seed set expansion for trust propagation based anti-spam algorithms
Information Sciences: an International Journal
Combating Web spam through trust-distrust propagation with confidence
Pattern Recognition Letters
Hi-index | 0.01 |
Seed sets are of significant importance for trust propagation based anti-spamming algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The small-sized seed set can cause detrimental effect on the final ranking results. Thus, it is desirable to automatically expand an initial seed set to a much larger one. In this paper, we propose the first automatic seed set expansion algorithm (ASE), which expands a small seed set by selecting reputable seeds that are found and guaranteed to be reputable through a joint recommendation link structure. Experimental results on the WEBSPAM-2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a large number of reputable seeds with high precision, thus significantly improving the performance of the baseline algorithm in terms of both reputable site promotion and spam site demotion.