Thwarting the nigritude ultramarine: learning to identify link spam

Authors:
Isabel Drost;Tobias Scheffer
Affiliations:
Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany;Department of Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 12
Cited 22

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Who Links to Whom: Mining Linkage between Web Sites

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
On the Evolution of Clusters of Near-Duplicate Web Pages

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Building Nutch: Open Source Search

Queue - Search Engines
Adversarial classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Challenges in web search engines

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Topical TrustRank: using topicality to combat web spam

Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web

Proceedings of the 15th international conference on World Wide Web
Improving web spam classifiers using link structure

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Measuring similarity to detect qualified links

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Web spam detection via commercial intent analysis

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Link analysis for Web spam detection

ACM Transactions on the Web (TWEB)
Combating Spamdexing: Incorporating Heuristics in Link-Based Ranking

Algorithms and Models for the Web-Graph
Cleaning search results using term distance features

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Exploring linguistic features for web spam detection: a preliminary study

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Predicting web spam with HTTP session information

Proceedings of the 17th ACM conference on Information and knowledge management
Web spam filtering in internet archives

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Detecting spam blogs: a machine learning approach

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Approximate Tree Kernels

The Journal of Machine Learning Research
On the robustness of google scholar against spam

Proceedings of the 21st ACM conference on Hypertext and hypermedia
Adversarial Web Search

Foundations and Trends in Information Retrieval
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
An analysis of optimal link bombs

Theoretical Computer Science
Detecting Fake Medical Web Sites Using Recursive Trust Labeling

ACM Transactions on Information Systems (TOIS)
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

International Journal of Information Security and Privacy
On the hardness of evading combinations of linear classifiers

Proceedings of the 2013 ACM workshop on Artificial intelligence and security

Quantified Score

Hi-index	0.00

Visualization

Abstract

The page rank of a commercial web site has an enormous economic impact because it directly influences the number of potential customers that find the site as a highly ranked search engine result. Link spamming – inflating the page rank of a target page by artificially creating many referring pages – has therefore become a common practice. In order to maintain the quality of their search results, search engine providers try to oppose efforts that decorrelate page rank and relevance and maintain blacklists of spamming pages while spammers, at the same time, try to camouflage their spam pages. We formulate the problem of identifying link spam and discuss a methodology for generating training data. Experiments reveal the effectiveness of classes of intrinsic and relational attributes and shed light on the robustness of classifiers against obfuscation of attributes by an adversarial spammer. We identify open research problems related to web spam.