Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Applying lazy learning algorithms to tackle concept drift in spam filtering
Expert Systems with Applications: An International Journal
An HMM for detecting spam mail
Expert Systems with Applications: An International Journal
An incremental cluster-based approach to spam filtering
Expert Systems with Applications: An International Journal
Foundations and Trends in Information Retrieval
Cloak and dagger: dynamics of web search cloaking
Proceedings of the 18th ACM conference on Computer and communications security
Feature evaluation for web crawler detection with data mining techniques
Expert Systems with Applications: An International Journal
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Evaluating Arabic spam classifiers using link analysis
Proceedings of the 3rd International Conference on Information and Communication Systems
Hi-index | 12.05 |
Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results. Cloaking is a widely adopted technique of concealing web spam by replying different content to search engines' crawlers from that displayed in a web browser. Previous work on cloaking detection is mainly based on the differences in terms and/or links between multiple copies of a URL retrieved from web browser and search engine crawler perspectives. This work presents three methods of using difference in tags to determine whether a URL is cloaked. Since the tags of a web page generally do not change as frequently and significantly as the terms and links of the web page, tag-based cloaking detection methods can work more effectively than the term- or link-based methods. The proposed methods are tested with a dataset of URLs covering short-, medium- and long-term users' interest. Experimental results indicate that the tag-based methods outperform term- or link-based methods in both precision and recall. Moreover, a Weka J4.8 classifier using a combination of term and tag features yields an accuracy rate of 90.48%.