Measurement and classification of humans and bots in internet chat

Authors:
Steven Gianvecchio;Mengjun Xie;Zhenyu Wu;Haining Wang
Affiliations:
Department of Computer Science, The College of William and Mary;Department of Computer Science, The College of William and Mary;Department of Computer Science, The College of William and Mary;Department of Computer Science, The College of William and Mary
Venue:
SS'08 Proceedings of the 17th conference on Security symposium
Year:
2008

Citing 14
Cited 11

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
An analysis of Internet chat systems

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Awarded Best Paper! - Scalable Centralized Bayesian Spam Mitigation with Bogofilter

LISA '04 Proceedings of the 18th USENIX conference on System administration
On instant messaging worms, analysis and countermeasures

Proceedings of the 2005 ACM workshop on Rapid malcode
Fast statistical spam filter by approximate classifications

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Detecting covert timing channels: an entropy-based approach

Proceedings of the 14th ACM conference on Computer and communications security
An advanced hybrid peer-to-peer botnet

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Rishi: identify bot contaminated hosts by IRC nickname evaluation

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Exploiting redundancy in natural language to penetrate Bayesian spam filters

WOOT '07 Proceedings of the first USENIX workshop on Offensive Technologies
BotHunter: detecting malware infection through IDS-driven dialog correlation

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Detecting and filtering instant messaging spam: a global and personalized approach

NPSEC'05 Proceedings of the First international conference on Secure network protocols
A study of Internet instant messaging and chat protocols

IEEE Network: The Magazine of Global Internetworking

Battle of Botcraft: fighting bots in online games with human observational proofs

Proceedings of the 16th ACM conference on Computer and communications security
Honeybot, your man in the middle for automated social engineering

LEET'10 Proceedings of the 3rd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Detecting algorithmically generated malicious domain names

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Who is tweeting on Twitter: human, bot, or cyborg?

Proceedings of the 26th Annual Computer Security Applications Conference
Applying biometric principles to avatar recognition

Transactions on computational science XII
User-Assisted host-based detection of outbound malware traffic

ICICS'09 Proceedings of the 11th international conference on Information and Communications Security
Review: SMS spam filtering: Methods and data

Expert Systems with Applications: An International Journal
Detecting algorithmically generated domain-flux attacks with DNS traffic analysis

IEEE/ACM Transactions on Networking (TON)
Botnets: A survey

Computer Networks: The International Journal of Computer and Telecommunications Networking
Effective bot host detection based on network failure models

Computer Networks: The International Journal of Computer and Telecommunications Networking
Architecture specification of rule-based deep web crawler with indexer

International Journal of Knowledge and Web Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The abuse of chat services by automated programs, known as chat bots, poses a serious threat to Internet users. Chat bots target popular chat networks to distribute spam and malware. In this paper, we first conduct a series of measurements on a large commercial chat network. Our measurements capture a total of 14 different types of chat bots ranging from simple to advanced. Moreover, we observe that human behavior is more complex than bot behavior. Based on the measurement study, we propose a classification system to accurately distinguish chat bots from human users. The proposed classification system consists of two components: (1) an entropy-based classifier and (2) a machine-learning-based classifier. The two classifiers complement each other in chat bot detection. The entropy-based classifier is more accurate to detect unknown chat bots, whereas the machine-learning-based classifier is faster to detect known chat bots. Our experimental evaluation shows that the proposed classification system is highly effective in differentiating bots from humans.