International Journal of Computer Vision
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Telling humans and computers apart automatically
Communications of the ACM - Information cities
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Scaling link-based similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Link spam detection based on mass estimation
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
A reference collection for web spam
ACM SIGIR Forum
Detecting Link Spam Using Temporal Information
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Combating web spam with trustrank
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting spam blogs: a machine learning approach
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Weblog classification for fast splog filtering: a URL language model segmentation approach
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Online spam-blog detection through blog search
Proceedings of the 17th ACM conference on Information and knowledge management
Annotating personal albums via web mining
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Detecting spammers and content promoters in online video social networks
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Expert Systems with Applications: An International Journal
A co-classification framework for detecting web spam and spammers in social media web sites
Proceedings of the 18th ACM conference on Information and knowledge management
Detectando usuários maliciosos em interações via vídeos no YouTube
Proceedings of the 14th Brazilian Symposium on Multimedia and the Web
A behavior-based SMS antispam system
IBM Journal of Research and Development
Detecting spam blogs from blog search results
Information Processing and Management: an International Journal
Foundations and Trends in Information Retrieval
Applying the data fusion technique to blog opinion retrieval
Expert Systems with Applications: An International Journal
Text mining and probabilistic language modeling for online review spam detection
ACM Transactions on Management Information Systems (TMIS)
Identifying important factors for future contribution of wikipedia editors
PKAW'12 Proceedings of the 12th Pacific Rim conference on Knowledge Management and Acquisition for Intelligent Systems
Connecting the dots: mass, energy, word meaning, and particle-wave duality
QI'12 Proceedings of the 6th international conference on Quantum Interaction
Feature identification for topical relevance assessment in feed search engines
Intelligent Data Analysis
Hi-index | 0.00 |
This article addresses the problem of spam blog (splog) detection using temporal and structural regularity of content, post time and links. Splogs are undesirable blogs meant to attract search engine traffic, used solely for promoting affiliate sites. Blogs represent popular online media, and splogs not only degrade the quality of search engine results, but also waste network resources. The splog detection problem is made difficult due to the lack of stable content descriptors. We have developed a new technique for detecting splogs, based on the observation that a blog is a dynamic, growing sequence of entries (or posts) rather than a collection of individual pages. In our approach, splogs are recognized by their temporal characteristics and content. There are three key ideas in our splog detection framework. (a) We represent the blog temporal dynamics using self-similarity matrices defined on the histogram intersection similarity measure of the time, content, and link attributes of posts, to investigate the temporal changes of the post sequence. (b) We study the blog temporal characteristics using a visual representation derived from the self-similarity measures. The visual signature reveals correlation between attributes and posts, depending on the type of blogs (normal blogs and splogs). (c) We propose two types of novel temporal features to capture the splog temporal characteristics. In our splog detector, these novel features are combined with content based features. We extract a content based feature vector from blog home pages as well as from different parts of the blog. The dimensionality of the feature vector is reduced by Fisher linear discriminant analysis. We have tested an SVM-based splog detector using proposed features on real world datasets, with appreciable results (90% accuracy).