Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
A study of link farm distribution and evolution using a time series of web snapshots
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting Link Hijacking by Web Spammers
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Isolation concepts for clique enumeration: Comparison and computational experiments
Theoretical Computer Science
Identifying spam link generators for monitoring emerging web spam
Proceedings of the 4th workshop on Information credibility
On the robustness of google scholar against spam
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
Portfolio: finding relevant functions and their usage
Proceedings of the 33rd International Conference on Software Engineering
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Practical representations for web and social graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Extended compact web graph representations
Algorithms and Applications
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Detecting Social Bookmark Spams Using Multiple User Accounts
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Streaming algorithms for k-core decomposition
Proceedings of the VLDB Endowment
Compact representation of Web graphs with extended functionality
Information Systems
Hi-index | 0.00 |
Link spam refers to attempts to promote the ranking of spammers' web sites by deceiving link-based ranking algorithms in search engines. Spammers often create densely connected link structure of sites so called "link farm". In this paper, we study the overall structure and distribution of link farms in a large-scale graph of the Japanese Web with 5.8 million sites and 283 million links. To examine the spam structure, we apply three graph algorithms to the web graph. First, the web graph is decomposed into strongly connected components (SCC). Beside the largest SCC (core) in the center of the web, we have observed that most of large components consist of link farms. Next, to extract spam sites in the core, we enumerate maximal cliques as seeds of link farms. Finally, we expand these link farms as a reliable spam seed set by a minimum cut technique that separates links among spam and non-spam sites. We found about 0.6 million spam sites in SCCs around the core, and extracted additional 8 thousand and 49 thousand sites as spams with high precision in the core by the maximal clique enumeration and by the minimum cut technique, respectively.