Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in web search engines
ACM SIGIR Forum
Design and Implementation of a High-Performance Distributed Web Crawler
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
On the Evolution of Clusters of Near-Duplicate Web Pages
LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam
ECML'05 Proceedings of the 16th European conference on Machine Learning
Improving web spam detection with re-extracted features
Proceedings of the 17th international conference on World Wide Web
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
Cleaning search results using term distance features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
The anti-social tagger: detecting spam in social bookmarking systems
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Spam characterization and detection in peer-to-peer file-sharing systems
Proceedings of the 17th ACM conference on Information and knowledge management
Cost-effective spam detection in p2p file-sharing systems
Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval
A brief survey of computational approaches in social computing
IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Improving spamdexing detection via a two-stage classification strategy
AIRS'08 Proceedings of the 4th Asia information retrieval conference on Information retrieval technology
On the robustness of google scholar against spam
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Foundations and Trends in Information Retrieval
The nuts and bolts of a forum spam automator
LEET'11 Proceedings of the 4th USENIX conference on Large-scale exploits and emergent threats
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Community-based features for identifying spammers in online social networks
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.