Splog Filtering Based on Writing Consistency

Authors:
Wei Liu;Songbo Tan;Hongbo Xu;Lihong Wang
Affiliations:
-;-;-;-
Venue:
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Year:
2008

Citing 6
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Splog detection using self-similarity analysis on blog temporal dynamics

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Detection of Bloggers' Interests: Using Textual, Temporal, and Interactive Features

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Extracting spam blogs with co-citation clusters

Proceedings of the 17th international conference on World Wide Web
Weblog classification for fast splog filtering: a URL language model segmentation approach

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Splog is the key challenge in the access of blogosphere. Existing splog-filtering methods are restricted to the way for traditional web spam filtering, without considering the characteristics of blogs. Inspired by the observation that fake writers (writers of splogs) have striking higher consistent writing behavior than real writers (writers of legitimate blogs), we propose to detect splogs by distinguishing fake writers from real writers. To measure how consistent the writing behavior is, we propose the consistency-based features derived from writing interval, writing structure and writing topic. Then we designed a splog-filtering system which can use the consistency-based features effectively and flexibly. The experimental results on Blog06 data set show that, proposed measure can effectively detect splogs, reaching an accuracy of 90%. Compared with content-based methods, our approach can get a comparable accuracy with fewer features and smaller train set, indicating that writing consistency represents the essential difference between splogs and blogs.