Detecting semantic cloaking on the web

Authors:
Baoning Wu;Brian D. Davison
Affiliations:
Lehigh University, Bethlehem, PA;Lehigh University, Bethlehem, PA
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 24
Cited 22

C4.5: programs for machine learning

C4.5: programs for machine learning
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical

Advances in kernel methods
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Analysis of a very large web search engine query log

ACM SIGIR Forum
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Enhanced topic distillation using text, markup tags, and hyperlinks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A large-scale study of the evolution of web pages

WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines

ACM SIGIR Forum
The connectivity sonar: detecting site functionality by structural patterns

Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Building Nutch: Open Source Search

Queue - Search Engines
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Ranking definitions with supervised learning methods

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Using ODP metadata to personalize search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Spam double-funnel: connecting web spammers with advertisers

Proceedings of the 16th international conference on World Wide Web
Improving web spam classifiers using link structure

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Query-log mining for detecting spam

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Detection of cloaked web spam by using tag-based methods

Expert Systems with Applications: An International Journal
Looking into the past to better classify web spam

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
A comparison of fraud cues and classification methods for fake escrow website detection

Information Technology and Management
Identifying and resolving hidden text salting

IEEE Transactions on Information Forensics and Security
Improving malicious URL re-evaluation scheduling through an empirical study of malware download centers

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Adversarial Web Search

Foundations and Trends in Information Retrieval
Detecting fake websites: the contribution of statistical learning theory

MIS Quarterly
No plan survives contact: experience with cybercrime measurement

CSET'11 Proceedings of the 4th conference on Cyber security experimentation and test
deSEO: combating search-result poisoning

SEC'11 Proceedings of the 20th USENIX conference on Security
SURF: detecting and measuring search poisoning

Proceedings of the 18th ACM conference on Computer and communications security
Cloak and dagger: dynamics of web search cloaking

Proceedings of the 18th ACM conference on Computer and communications security
Web Spam Detection by Exploring Densely Connected Subgraphs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Fighting against web spam: a novel propagation method based on click-through data

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling

ACM Transactions on Information Systems (TOIS)
PoisonAmplifier: a guided approach of discovering compromised websites through reversing search poisoning attacks

RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
Shady paths: leveraging surfing crowds to detect malicious web pages

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Cross-modal social image clustering and tag cleansing

Journal of Visual Communication and Image Representation

Quantified Score

Hi-index	0.00

Visualization

Abstract

By supplying different versions of a web page to search engines and to browsers, a content provider attempts to cloak the real content from the view of the search engine. Semantic cloaking refers to differences in meaning between pages which have the effect of deceiving search engine ranking algorithms. In this paper, we propose an automated two-step method to detect semantic cloaking pages based on different copies of the same page downloaded by a web crawler and a web browser. The first step is a filtering step, which generates a candidate list of semantic cloaking pages. In the second step, a classifier is used to detect semantic cloaking pages from the candidates generated by the filtering step. Experiments on manually labeled data sets show that we can generate a classifier with a precision of 93% and a recall of 85%. We apply our approach to links from the dmoz Open Directory Project and estimate that more than 50,000 of these pages employ semantic cloaking.