C4.5: programs for machine learning
C4.5: programs for machine learning
Artificial Intelligence Review - Special issue on lazy learning
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Fast training of support vector machines using sequential minimal optimization
Advances in kernel methods
IEEE Intelligent Systems
Experiencies Retrieving Information in the World Wide Web
ISCC '01 Proceedings of the Sixth IEEE Symposium on Computers and Communications
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Fuzzy lattice reasoning (FLR) classifier and its application for ambient ozone estimation
International Journal of Approximate Reasoning
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
The ghost in the browser analysis of web-based malware
HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Exploiting redundancy in natural language to penetrate Bayesian spam filters
WOOT '07 Proceedings of the first USENIX workshop on Offensive Technologies
SS'08 Proceedings of the 17th conference on Security symposium
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Foundations and Trends in Information Retrieval
Hi-index | 0.00 |
Web spam denotes the manipulation of web pages with the sole intent to raise their position in search engine rankings. Since a better position in the rankings directly and positively affects the number of visits to a site, attackers use different techniques to boost their pages to higher ranks. In the best case, web spam pages are a nuisance that provide undeserved advertisement revenues to the page owners. In the worst case, these pages pose a threat to Internet users by hosting malicious content and launching drive-by attacks against unsuspecting victims. When successful, these drive-by attacks then install malware on the victims' machines. In this paper, we introduce an approach to detect web spam pages in the list of results that are returned by a search engine. In a first step, we determine the importance of different page features to the ranking in search engine results. Based on this information, we develop a classification technique that uses important features to successfully distinguish spam sites from legitimate entries. By removing spam sites from the results, more slots are available to links that point to pages with useful content. Additionally, and more importantly, the threat posed by malicious web sites can be mitigated, reducing the risk for users to get infected by malicious code that spreads via drive-by attacks.