Spam detection on twitter using traditional classifiers

Authors:
M. McCord;M. Chuah
Affiliations:
Computer Science & Engineering Department, Lehigh University, Bethlehem, PA;Computer Science & Engineering Department, Lehigh University, Bethlehem, PA
Venue:
ATC'11 Proceedings of the 8th international conference on Autonomic and trusted computing
Year:
2011

Citing 6
Cited 1

Instance-Based Learning Algorithms

Machine Learning
Random Forests

Machine Learning
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Social phishing

Communications of the ACM
All your contacts are belong to us: automated identity theft attacks on social networks

Proceedings of the 18th international conference on World wide web
Detecting spammers on social networks

Proceedings of the 26th Annual Computer Security Applications Conference

Twitter spammer detection using data stream clustering

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also attracts many spammers to infiltrate legitimate users' accounts with a large amount of spam messages. In this paper, we discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/ following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Our results show that among the four classifiers we evaluated, the Random Forest classifier produces the best results. Our spam detector can achieve 95.7% precision and 95.7% F-measure using the Random Forest classifier.