WWW '03 Proceedings of the 12th international conference on World Wide Web
ACM SIGIR Forum
Challenges in web search engines
ACM SIGIR Forum
Proceedings of the 13th international conference on World Wide Web
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting online commercial intention (OCI)
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
Cleaning search results using term distance features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Improving spamdexing detection via a two-stage classification strategy
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Inference in possibilistic network classifiers under uncertain observations
Annals of Mathematics and Artificial Intelligence
Towards linking buyers and sellers: detecting commercial Intent on twitter
Proceedings of the 22nd international conference on World Wide Web companion
Hi-index | 0.00 |
We propose a number of features for Web spam filtering based on the occurrence of keywords that are either of high advertisement value or highly spammed. Our features include popular words from search engine query logs as well as high cost or volume words according to Google AdWords. We also demonstrate the spam filtering power of the Online Commercial Intention (OCI) value assigned to an URL in a Microsoft adCenter Labs Demonstration and the Yahoo! Mindset classification of Web pages as either commercial or non-commercial as well as metrics based on the occurrence of Google ads on the page. We run our tests on the WEBSPAM-UK2006 dataset recently compiled by Castillo et al. as a standard means of measuring the performance of Web spam detection algorithms. Our features improve the classification accuracy of the publicly available WEBSPAM-UK2006 features by 3%.