A study of link farm distribution and evolution using a time series of web snapshots

Authors:
Young-joo Chung;Masashi Toyoda;Masaru Kitsuregawa
Affiliations:
University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan;University of Tokyo, Tokyo, Japan
Venue:
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Year:
2009

Citing 14
Cited 14

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Creating a Web community chart for navigating related communities

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Extracting evolution of web communities from a series of web archives

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Site level noise removal for search engines

Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam

ACM SIGIR Forum
A large-scale study of link spam detection by graph algorithms

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Socio-sense: a system for analysing the societal behavior from long term web archive

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development

Web spam challenge proposal for filtering in archives

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Identifying spam link generators for monitoring emerging web spam

Proceedings of the 4th workshop on Information credibility
Freshness matters: in flowers, food, and web authority

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Temporal query log profiling to improve web search ranking

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining useful time graph patterns on extensively discussed topics on the web

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Let web spammers expose themselves

Proceedings of the fourth ACM international conference on Web search and data mining
Web spam classification: a few features worth more

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Adversarial Web Search

Foundations and Trends in Information Retrieval
Using patterns in the behavior of the random surfer to detect webspam beneficiaries

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Webspam demotion: Low complexity node aggregation methods

Neurocomputing
Using site-level connections to estimate link confidence

Journal of the American Society for Information Science and Technology
Evaluating web archive search systems

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Detecting Webspam Beneficiaries Using Information Collected by the Random Surfer

International Journal of Organizational and Collective Intelligence
Combating Web spam through trust-distrust propagation with confidence

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming as a social activity in the cyber space. First, we use strongly connected component (SCC) decomposition to separate many link farms from the largest SCC, so called the core. We show that denser link farms in the core can be extracted by node filtering and recursive application of SCC decomposition to the core. Surprisingly, we can find new large link farms during each iteration and this trend continues until at least 10 iterations. In addition, we measure the spamicity of such link farms. Next, the evolution of link farms is examined over two years. Results show that almost all large link farms do not grow anymore while some of them shrink, and many large link farms are created in one year.