C4.5: programs for machine learning
C4.5: programs for machine learning
Splog detection using self-similarity analysis on blog temporal dynamics
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Detection of Bloggers' Interests: Using Textual, Temporal, and Interactive Features
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Extracting spam blogs with co-citation clusters
Proceedings of the 17th international conference on World Wide Web
Weblog classification for fast splog filtering: a URL language model segmentation approach
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Hi-index | 0.00 |
Splog is the key challenge in the access of blogosphere. Existing splog-filtering methods are restricted to the way for traditional web spam filtering, without considering the characteristics of blogs. Inspired by the observation that fake writers (writers of splogs) have striking higher consistent writing behavior than real writers (writers of legitimate blogs), we propose to detect splogs by distinguishing fake writers from real writers. To measure how consistent the writing behavior is, we propose the consistency-based features derived from writing interval, writing structure and writing topic. Then we designed a splog-filtering system which can use the consistency-based features effectively and flexibly. The experimental results on Blog06 data set show that, proposed measure can effectively detect splogs, reaching an accuracy of 90%. Compared with content-based methods, our approach can get a comparable accuracy with fewer features and smaller train set, indicating that writing consistency represents the essential difference between splogs and blogs.