Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Modern Information Retrieval
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Proceedings of the 13th international conference on World Wide Web
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
ACM Transactions on Internet Technology (TOIT)
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Proceedings of the 15th international conference on World Wide Web
Using spam farm to boost PageRank
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
Automatic seed set expansion for trust propagation based anti-spamming algorithms
Proceedings of the eleventh international workshop on Web information and data management
On the robustness of google scholar against spam
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Foundations and Trends in Information Retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Automatic seed set expansion for trust propagation based anti-spam algorithms
Information Sciences: an International Journal
Hi-index | 0.00 |
Link spam deliberately manipulates hyperlinks between web pages in order to unduly boost the search engine ranking of one or more target pages. Link based ranking algorithms such as PageRank, HITS, and other derivatives are especially vulnerable to link spam. Link farms and link exchanges are two common instances of link spam that produce spam communities -- i.e., clusters in the web graph. In this paper, we present a directed approach to extracting link spam communities when given one or more members of the community. In contrast to previous completely automated approaches to finding link spam, our method is specifically designed to be used interactively. Our approach starts with a small spam seed set provided by the user and simulates a random walk on the web graph. The random walk is biased to explore the local neighborhood around the seed set through the use of decay probabilities. Truncation is used to retain only the most frequently visited nodes. After termination, the nodes are sorted in decreasing order of their final probabilities and presented to the user. Experiments using manually labeled link spam data sets and random walks from a single seed domain show that the approach achieves over 95.12% precision in extracting large link farms and 80.46% precision in extracting link exchange centroids.