The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Optimizing web servers using page rank prefetching for clustered accesses
Information Sciences—Informatics and Computer Science: An International Journal - Internet computing
Challenges in web search engines
ACM SIGIR Forum
A note on the paper: optimizing web servers using page rank prefetching for clustered accesses
Information Sciences—Informatics and Computer Science: An International Journal
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
MailRank: using ranking for spam detection
Proceedings of the 14th ACM international conference on Information and knowledge management
Semi-supervised learning with graphs
Semi-supervised learning with graphs
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Transductive link spam detection
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Extracting link spam using biased random walks from spam seed sets
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Learning random walks to rank nodes in graphs
Proceedings of the 24th international conference on Machine learning
DiffusionRank: a possible penicillin for web spamming
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Larger is better: seed selection in link-based anti-spamming algorithms
Proceedings of the 17th international conference on World Wide Web
BrowseRank: letting web users vote for page importance
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Exploiting bidirectional links: making spamming detection easier
Proceedings of the 18th ACM conference on Information and knowledge management
Automatic seed set expansion for trust propagation based anti-spamming algorithms
Proceedings of the eleventh international workshop on Web information and data management
Graph regularization methods for Web spam detection
Machine Learning
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Towards a Relevant and Diverse Search of Social Images
IEEE Transactions on Multimedia
Hi-index | 0.07 |
Seed sets are of significant importance to trust propagation based anti-spam algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The detrimental effect will be caused to the final ranking results by the small-sized seed sets. Thus, it is desirable to automatically expand an initial seed set to a larger one. In this paper, we propose an automatic seed set expansion algorithm (ASE) which enriches a small seed set to a much larger one. The intuition behind ASE is that if a page is recommended by a number of trustworthy pages, the page itself should be trustworthy as well. Since links on the Web can be considered as a tool for conveying recommendation, we call links recommending the same page a joint recommendation link structure. The joint recommendation link structures with large enough support degrees are employed by ASE algorithm to obtain new seeds. It can be proved that using the joint recommendation link structure with a suitable support degree, the probability of selecting a spam page as a new seed almost to zero, thus the quality of the expanded seed set can be guaranteed. Experimental results on the WEBSPAM-UK2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a lot of reputable seeds with very high quality, and significantly improves the performance of trust propagation based algorithms such as TrustRank and CPV (Computing Page Values).