Automatic seed set expansion for trust propagation based anti-spam algorithms

Authors:
Xianchao Zhang;Wenxin Liang;Shaoping Zhu;Bo Han
Affiliations:
School of Software, Dalian University of Technology, Economy and Technology Development Area, 116620 Dalian, China;School of Software, Dalian University of Technology, Economy and Technology Development Area, 116620 Dalian, China;School of Software, Dalian University of Technology, Economy and Technology Development Area, 116620 Dalian, China;Department of Computer Science and Software Engineering, University of Melbourne, ICT 6.05, 111 Barry St. Carlton, VIC 3010, Australia
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 25
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Optimizing web servers using page rank prefetching for clustered accesses

Information Sciences—Informatics and Computer Science: An International Journal - Internet computing
Challenges in web search engines

ACM SIGIR Forum
A note on the paper: optimizing web servers using page rank prefetching for clustered accesses

Information Sciences—Informatics and Computer Science: An International Journal
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
MailRank: using ranking for spam detection

Proceedings of the 14th ACM international conference on Information and knowledge management
Semi-supervised learning with graphs

Semi-supervised learning with graphs
Topical TrustRank: using topicality to combat web spam

Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Transductive link spam detection

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Extracting link spam using biased random walks from spam seed sets

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Learning random walks to rank nodes in graphs

Proceedings of the 24th international conference on Machine learning
DiffusionRank: a possible penicillin for web spamming

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Larger is better: seed selection in link-based anti-spamming algorithms

Proceedings of the 17th international conference on World Wide Web
BrowseRank: letting web users vote for page importance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Looking into the past to better classify web spam

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Exploiting bidirectional links: making spamming detection easier

Proceedings of the 18th ACM conference on Information and knowledge management
Automatic seed set expansion for trust propagation based anti-spamming algorithms

Proceedings of the eleventh international workshop on Web information and data management
Graph regularization methods for Web spam detection

Machine Learning
Web Spam Detection by Exploring Densely Connected Subgraphs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Towards a Relevant and Diverse Search of Social Images

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.07

Visualization

Abstract

Seed sets are of significant importance to trust propagation based anti-spam algorithms, e.g., TrustRank. Conventional approaches require manual evaluation to construct a seed set, which restricts the seed set to be small in size, since it would cost too much and may even be impossible to construct a very large seed set manually. The detrimental effect will be caused to the final ranking results by the small-sized seed sets. Thus, it is desirable to automatically expand an initial seed set to a larger one. In this paper, we propose an automatic seed set expansion algorithm (ASE) which enriches a small seed set to a much larger one. The intuition behind ASE is that if a page is recommended by a number of trustworthy pages, the page itself should be trustworthy as well. Since links on the Web can be considered as a tool for conveying recommendation, we call links recommending the same page a joint recommendation link structure. The joint recommendation link structures with large enough support degrees are employed by ASE algorithm to obtain new seeds. It can be proved that using the joint recommendation link structure with a suitable support degree, the probability of selecting a spam page as a new seed almost to zero, thus the quality of the expanded seed set can be guaranteed. Experimental results on the WEBSPAM-UK2007 dataset show that with the same manual evaluation efforts, ASE can automatically obtain a lot of reputable seeds with very high quality, and significantly improves the performance of trust propagation based algorithms such as TrustRank and CPV (Computing Page Values).