Improving web spam classifiers using link structure

Authors:
Qingqing Gan;Torsten Suel
Affiliations:
Polytechnic University, Brooklyn, NY;Polytechnic University, Brooklyn, NY
Venue:
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Year:
2007

Citing 14
Cited 15

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in web search engines

ACM SIGIR Forum
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The connectivity sonar: detecting site functionality by structural patterns

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
On the Evolution of Clusters of Near-Duplicate Web Pages

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis

Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web

Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Improving web spam detection with re-extracted features

Proceedings of the 17th international conference on World Wide Web
Adversarial Information Retrieval on the Web (AIRWeb 2007)

ACM SIGIR Forum
Cleaning search results using term distance features

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Spam characterization and detection in peer-to-peer file-sharing systems

Proceedings of the 17th ACM conference on Information and knowledge management
Cost-effective spam detection in p2p file-sharing systems

Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
A brief survey of computational approaches in social computing

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Improving spamdexing detection via a two-stage classification strategy

AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
On the robustness of google scholar against spam

Proceedings of the 21st ACM conference on Hypertext and hypermedia
Adversarial Web Search

Foundations and Trends in Information Retrieval
The nuts and bolts of a forum spam automator

LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Detecting Fake Medical Web Sites Using Recursive Trust Labeling

ACM Transactions on Information Systems (TOIS)
Community-based features for identifying spammers in online social networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.