The Journal of Machine Learning Research
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Introduction to Information Retrieval
Introduction to Information Retrieval
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through language model analysis
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Linked latent Dirichlet allocation in web spam filtering
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Link spam target detection using page farms
ACM Transactions on Knowledge Discovery from Data (TKDD)
Web spam detection: new classification features based on qualified link analysis and language models
IEEE Transactions on Information Forensics and Security
Web spam classification: a few features worth more
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Recent studies about web spam detection have utilized various content-based and link-based features to construct a spam classification model. In this paper, we conduct a thorough analysis of content spam on the web using topic models and propose several novel topical diversity measures for content spam detection. We adopt the web spam benchmark data set WEBSPAM-UK2007 for evaluation, and the experimental results verify that by integrating our topical diversity measures the performance of the state-of-the-art web spam detection methods can be greatly improved. In addition, comparing to existing features for training spam classification models, our topical diversity measures can achieve high spam detection performance using small set of training data. In personalized web spam detection, the training data (i.e., user's spam labeling results) are typically small. Our finding makes personalized web spam detection highly achievable. We develop an efficient and effective regression model using topical diversity measures for personalized web spam detection, and present some promising results obtained from an empirical study.