Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Detecting Link Spam Using Temporal Information
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting spam blogs: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Identifying the influential bloggers in a community
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
A comparative study of statistical features of language in blogs-vs-splogs
Proceedings of the second workshop on Analytics for noisy unstructured text data
Adversarial Information Retrieval on the Web (AIRWeb 2007)
ACM SIGIR Forum
A study of communities and influence in blogosphere
Proceedings of the 2nd SIGMOD PhD workshop on Innovative database research
Blogosphere: research issues, tools, and applications
ACM SIGKDD Explorations Newsletter
Analysing features of Japanese splogs and characteristics of keywords
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Splog Filtering Based on Writing Consistency
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Looking into the past to better classify web spam
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
An empirical study on selective sampling in active learning for splog detection
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Uncovering social spammers: social honeypots + machine learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Temporal query log profiling to improve web search ranking
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A tag-topic model for blog mining
Expert Systems with Applications: An International Journal
Quantifying sentiment and influence in blogspaces
Proceedings of the First Workshop on Social Media Analytics
Foundations and Trends in Information Retrieval
Comparing similarity of HTML structures and affiliate IDs in splog analysis
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Detecting splogs using similarities of splog HTML structures
Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Information Retrieval on the Blogosphere
Foundations and Trends in Information Retrieval
Detecting Fake Medical Web Sites Using Recursive Trust Labeling
ACM Transactions on Information Systems (TOIS)
Probabilistic Models for Social Media Mining
International Journal of Information Technology and Web Engineering
Hi-index | 0.00 |
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms. The presence of splogs degrades blog search results as well as wastes network resources. In our approach we exploit unique blog temporal dynamics to detect splogs. There are three key ideas in our splog detection framework. We first represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts. Second, we show via a novel visualization that the blog temporal characteristics reveal attribute correlation, depending on type of the blog (normal blogs and splogs). Third, we propose the use of temporal structural properties computed from self-similarity matrices across different attributes. In a splog detector, these novel features are combined with content based features. We extract a content based feature vector from different parts of the blog -- URLs, post content, etc. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM based splog detector using proposed features on real world datasets, with excellent results (90% accuracy).