C4.5: programs for machine learning
C4.5: programs for machine learning
Improved algorithms for topic distillation in a hyperlinked environment
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Making large-scale support vector machine learning practical
Advances in kernel methods
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Analysis of a very large web search engine query log
ACM SIGIR Forum
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Enhanced topic distillation using text, markup tags, and hyperlinks
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
Challenges in web search engines
ACM SIGIR Forum
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Building Nutch: Open Source Search
Queue - Search Engines
What's new on the web?: the evolution of the web from a search engine perspective
Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Ranking definitions with supervised learning methods
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Using ODP metadata to personalize search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Improving web spam classifiers using link structure
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Query-log mining for detecting spam
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Detection of cloaked web spam by using tag-based methods
Expert Systems with Applications: An International Journal
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
A comparison of fraud cues and classification methods for fake escrow website detection
Information Technology and Management
Identifying and resolving hidden text salting
IEEE Transactions on Information Forensics and Security
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Foundations and Trends in Information Retrieval
No plan survives contact: experience with cybercrime measurement
CSET'11 Proceedings of the 4th conference on Cyber security experimentation and test
deSEO: combating search-result poisoning
SEC'11 Proceedings of the 20th USENIX conference on Security
SURF: detecting and measuring search poisoning
Proceedings of the 18th ACM conference on Computer and communications security
Cloak and dagger: dynamics of web search cloaking
Proceedings of the 18th ACM conference on Computer and communications security
Web Spam Detection by Exploring Densely Connected Subgraphs
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
Shady paths: leveraging surfing crowds to detect malicious web pages
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Cross-modal social image clustering and tag cleansing
Journal of Visual Communication and Image Representation
Hi-index | 0.00 |
By supplying different versions of a web page to search engines and to browsers, a content provider attempts to cloak the real content from the view of the search engine. Semantic cloaking refers to differences in meaning between pages which have the effect of deceiving search engine ranking algorithms. In this paper, we propose an automated two-step method to detect semantic cloaking pages based on different copies of the same page downloaded by a web crawler and a web browser. The first step is a filtering step, which generates a candidate list of semantic cloaking pages. In the second step, a classifier is used to detect semantic cloaking pages from the candidates generated by the filtering step. Experiments on manually labeled data sets show that we can generate a classifier with a precision of 93% and a recall of 85%. We apply our approach to links from the dmoz Open Directory Project and estimate that more than 50,000 of these pages employ semantic cloaking.