Detecting spam blogs: a machine learning approach

Authors:
Pranam Kolari;Akshay Java;Tim Finin;Tim Oates;Anupam Joshi
Affiliations:
University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD;University of Maryland Baltimore County, Baltimore, MD
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 7
Cited 20

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Adversarial classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Thwarting the nigritude ultramarine: learning to identify link spam

ECML'05 Proceedings of the 16th European conference on Machine Learning

Splog detection using self-similarity analysis on blog temporal dynamics

AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Detecting splogs via temporal dynamics using self-similarity analysis

ACM Transactions on the Web (TWEB)
Personal vs non-personal blogs: initial classification experiments

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A comparative study of statistical features of language in blogs-vs-splogs

Proceedings of the second workshop on Analytics for noisy unstructured text data
Blogosphere: research issues, tools, and applications

ACM SIGKDD Explorations Newsletter
The anti-social tagger: detecting spam in social bookmarking systems

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Temporal and information flow based event detection from social text streams

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
TrackBack spam: abuse and prevention

Proceedings of the 2009 ACM workshop on Cloud computing security
A user-oriented splog filtering based on a machine learning

BlogTalk'08/09 Proceedings of the 2008/2009 international conference on Social software: recent trends and developments in social software
Detecting spam blogs from blog search results

Information Processing and Management: an International Journal
Adversarial Web Search

Foundations and Trends in Information Retrieval
Reclaiming the blogosphere, talkback: a secure linkback protocol for weblogs

ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Spotting fake reviewer groups in consumer reviews

Proceedings of the 21st international conference on World Wide Web
Survey on web spam detection: principles and algorithms

ACM SIGKDD Explorations Newsletter
Information Retrieval on the Blogosphere

Foundations and Trends in Information Retrieval
A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

International Journal of Information Security and Privacy
Simultaneously detecting fake reviews and review spammers using factor graph model

Proceedings of the 5th Annual ACM Web Science Conference
Spotting opinion spammers using behavioral footprints

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature identification for topical relevance assessment in feed search engines

Intelligent Data Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Weblogs or blogs are an important new way to publish information, engage in discussions, and form communities on the Internet. The Blogosphere has unfortunately been infected by several varieties of spam-like content. Blog search engines, for example, are inundated by posts from splogs - false blogs with machine generated or hijacked content whose sole purpose is to host ads or raise the PageRank of target sites. We discuss how SVM models based on local and link-based features can be used to detect splogs. We present an evaluation of learned models and their utility to blog search engines; systems that employ techniques differing from those of conventional web search engines.