The nature of statistical learning theory
The nature of statistical learning theory
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
Spam double-funnel: connecting web spammers with advertisers
Proceedings of the 16th international conference on World Wide Web
Challenges in web search engines
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
Identifying Spam Web Pages Based on Content Similarity
ICCSA '08 Proceedings of the international conference on Computational Science and Its Applications, Part II
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Spam characterization and detection in peer-to-peer file-sharing systems
Proceedings of the 17th ACM conference on Information and knowledge management
Cost-effective spam detection in p2p file-sharing systems
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
Web Spam Identification with User Browsing Graph
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Study on the Click Context of Web Search Users for Reliability Analysis
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Removing web spam links from search engine results
Journal in Computer Virology
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Content-based analysis to detect Arabic web spam
Journal of Information Science
Shame to be sham: addressing content-based grey hat search engine optimization
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Hi-index | 0.00 |
In this paper, we study the classification of web spam. Web spam refers to pages that use techniques to mislead search engines into assigning them higher rank, thus increasing their site traffic. Our contributions are two fold. First, we find that the method of datset construction is crucial for accurate spam classification and we note that this problem occurs generally in learning problems and can be hard to detect. In particular, we find that ensuring no overlapping domains between test and training sets is necessary to accurately test a web spam classifier. In our case, classification performance can differ by as much as 40% in precision when using non-domain-separated data. Second, we show rank-time features can improve the performance of a web spam classifier. Our paper is the first to investigate the use of rank-time features, and in particular query-dependent rank-time features, for web spam detection. We show that the use of rank-time and query-dependent features can lead to an increase in accuracy over a classifier trained using page-based content only.