Detecting malicious tweets in trending topics using a statistical analysis of language

Authors:
Juan Martinez-Romo;Lourdes Araujo
Affiliations:
NLP & IR Group, Dpto. Lenguajes y Sistemas Informááticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain;NLP & IR Group, Dpto. Lenguajes y Sistemas Informááticos, Universidad Nacional de Educación a Distancia (UNED), Madrid 28040, Spain
Venue:
Expert Systems with Applications: An International Journal
Year:
2013

Citing 20
Cited 1

Retrieval and novelty detection at the sentence level

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Detecting nepotistic links by language model disagreement

Proceedings of the 15th international conference on World Wide Web
Know your neighbors: web spam detection using the web topology

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Computation of distances for regular and context-free probabilistic languages

Theoretical Computer Science
Collaborative spam filtering with heterogeneous agents

Expert Systems with Applications: An International Journal
Web spam identification through language model analysis

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Review: A review of machine learning approaches to Spam filtering

Expert Systems with Applications: An International Journal
Uncovering social spammers: social honeypots + machine learning

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Web spam detection: new classification features based on qualified link analysis and language models

IEEE Transactions on Information Forensics and Security
Detecting spammers on social networks

Proceedings of the 26th Annual Computer Security Applications Conference
Design and Evaluation of a Real-Time URL Spam Filtering Service

SP '11 Proceedings of the 2011 IEEE Symposium on Security and Privacy
Suspended accounts in retrospect: an analysis of twitter spam

Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference
Spam filtering in twitter using sender-receiver relationship

RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers

RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
Understanding and combating link farming in the twitter social network

Proceedings of the 21st international conference on World Wide Web
Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter

Proceedings of the 21st international conference on World Wide Web
Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts

Expert Systems with Applications: An International Journal
Unsupervised and supervised learning to evaluate event relatedness based on content mining from social-media streams

Expert Systems with Applications: An International Journal
Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?

IEEE Transactions on Dependable and Secure Computing
Representative reviewers for Internet social media

Expert Systems with Applications: An International Journal

Identifying interesting Twitter contents using topical analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody's lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34K trending topics and 20million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-of-the-art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.