Tag spam creates large non-giant connected components

Authors:
Nicolas Neubauer;Robert Wetzker;Klaus Obermayer
Affiliations:
Technische Universität Berlin;Technische Universität Berlin;Technische Universität Berlin
Venue:
Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Year:
2009

Citing 4
Cited 2

Fighting Spam on Social Web Sites: A Survey of Approaches and Future Challenges

IEEE Internet Computing
Weighted graphs and disconnected components: patterns and a generator

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The anti-social tagger: detecting spam in social bookmarking systems

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Hyperincident connected components of tagging networks

Proceedings of the 20th ACM conference on Hypertext and hypermedia

Adversarial Web Search

Foundations and Trends in Information Retrieval
A Local Method for ObjectRank Estimation

Proceedings of International Conference on Information Integration and Web-based Applications & Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spammers in social bookmarking systems try to mimick bookmarking behaviour of real users to gain the attention of other users or search engines. Several methods have been proposed for the detection of such spam, including domain-specific features (like URL terms) or similarity of users to previously identified spammers. However, as shown in our previous work, it is possible to identify a large fraction of spam users based on purely structural features. The hypergraph connecting documents, users, and tags can be decomposed into connected components, and any large, but non-giant components turned out to be almost entirely inhabitated by spam users in the examined dataset. Here, we test to what degree the decomposition of the complete hypergraph is really necessary, examining the component structure of the induced user/document and user/tag graphs. While the user/tag graph's connectivity does not help in classifying spammers, the user/document graph's connectivity is already highly informative. It can however be augmented with connectivity information from the hypergraph. In our view, spam detection based on structural features, like the one proposed here, requires complex adaptation strategies from spammers and may complement other, more traditional detection approaches.