C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
The Journal of Machine Learning Research
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting nepotistic links by language model disagreement
Proceedings of the 15th international conference on World Wide Web
A reference collection for web spam
ACM SIGIR Forum
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Detecting image spam using visual features and near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Exploring linguistic features for web spam detection: a preliminary study
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through content and hyperlinks
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Web spam identification through language model analysis
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Image spam clustering: an unsupervised approach
MiFor '09 Proceedings of the First ACM workshop on Multimedia in forensics
Spam detection with a content-based random-walk algorithm
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Hi-index | 0.00 |
Online classified advertisements have become an essential part of the advertisement market. Popular online classified advertisement sites such as Craigslist, Ebay Classifieds, and Oodle have attracted a huge number of posts and visits. Due to its high commercial potential, the online classified advertisement domain is a target for spammers, and this has become one of the biggest issues hindering further development of online advertisement. Therefore, spam detection in online advertisement is a crucial problem. However, previous approaches for Web spam detection in other domains do not work well in the advertisement domain. We propose a novel spam detection approach that takes into account the particular characteristics of this domain. Specifically, we propose a novel set of features that could strongly discriminate between spam and legitimate advertisement posts. Our experiments on a dataset derived from Craigslist advertisements demonstrate the effectiveness of our approach. In particular, the approach provides improvements of 55% in terms of F-1 score over a baseline that uses traditional features alone.