The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Creating a Web community chart for navigating related communities
Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Extracting evolution of web communities from a series of web archives
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
A large-scale study of link spam detection by graph algorithms
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Socio-sense: a system for analysing the societal behavior from long term web archive
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Web spam challenge proposal for filtering in archives
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Identifying spam link generators for monitoring emerging web spam
Proceedings of the 4th workshop on Information credibility
Freshness matters: in flowers, food, and web authority
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Temporal query log profiling to improve web search ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining useful time graph patterns on extensively discussed topics on the web
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Let web spammers expose themselves
Proceedings of the fourth ACM international conference on Web search and data mining
Web spam classification: a few features worth more
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
Using patterns in the behavior of the random surfer to detect webspam beneficiaries
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Webspam demotion: Low complexity node aggregation methods
Neurocomputing
Using site-level connections to estimate link confidence
Journal of the American Society for Information Science and Technology
Evaluating web archive search systems
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Detecting Webspam Beneficiaries Using Information Collected by the Random Surfer
International Journal of Organizational and Collective Intelligence
Combating Web spam through trust-distrust propagation with confidence
Pattern Recognition Letters
Hi-index | 0.00 |
In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming as a social activity in the cyber space. First, we use strongly connected component (SCC) decomposition to separate many link farms from the largest SCC, so called the core. We show that denser link farms in the core can be extracted by node filtering and recursive application of SCC decomposition to the core. Surprisingly, we can find new large link farms during each iteration and this trend continues until at least 10 iterations. In addition, we measure the spamicity of such link farms. Next, the evolution of link farms is examined over two years. Results show that almost all large link farms do not grow anymore while some of them shrink, and many large link farms are created in one year.