Humans and bots in internet chat: measurement, analysis, and automated classification

Authors:
Steven Gianvecchio;Mengjun Xie;Zhenyu Wu;Haining Wang
Affiliations:
MITRE Corporation, McLean, VA;Department of Computer Science, University of Arkansas at Little Rock, Little Rock, AR;Department of Computer Science, College of William and Mary, Williamsburg, VA;Department of Computer Science, College of William and Mary, Williamsburg, VA
Venue:
IEEE/ACM Transactions on Networking (TON)
Year:
2011

Citing 20
Cited 0

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
An analysis of Internet chat systems

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification

Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Awarded Best Paper! - Scalable Centralized Bayesian Spam Mitigation with Bogofilter

LISA '04 Proceedings of the 18th USENIX conference on System administration
On instant messaging worms, analysis and countermeasures

Proceedings of the 2005 ACM workshop on Rapid malcode
Adaptive Spam Filtering Using Dynamic Feature Space

ICTAI '05 Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence
Fast statistical spam filter by approximate classifications

SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Detecting covert timing channels: an entropy-based approach

Proceedings of the 14th ACM conference on Computer and communications security
Rishi: identify bot contaminated hosts by IRC nickname evaluation

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Exploiting redundancy in natural language to penetrate Bayesian spam filters

WOOT '07 Proceedings of the first USENIX workshop on Offensive Technologies
Analyzing network and content characteristics of spim using honeypots

SRUTI'07 Proceedings of the 3rd USENIX workshop on Steps to reducing unwanted traffic on the internet
BotHunter: detecting malware infection through IDS-driven dialog correlation

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Turing's Imitation Game: Still an Impossible Challenge for All Machines and Some Judges------An Evaluation of the 2008 Loebner Contest

Minds and Machines
Towards complete node enumeration in a peer-to-peer botnet

Proceedings of the 4th International Symposium on Information, Computer, and Communications Security
Reducing the Attack Surface in Massively Multiplayer Online Role-Playing Games

IEEE Security and Privacy
CAPTCHA: using hard AI problems for security

EUROCRYPT'03 Proceedings of the 22nd international conference on Theory and applications of cryptographic techniques
Honeybot, your man in the middle for automated social engineering

LEET'10 Proceedings of the 3rd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Detecting and filtering instant messaging spam: a global and personalized approach

NPSEC'05 Proceedings of the First international conference on Secure network protocols
BotGrep: finding P2P bots with structured graph analysis

USENIX Security'10 Proceedings of the 19th USENIX conference on Security
A study of Internet instant messaging and chat protocols

IEEE Network: The Magazine of Global Internetworking

Quantified Score

Hi-index	0.00

Visualization

Abstract

The abuse of chat services by automated programs, known as chat bots, poses a serious threat to Internet users. Chat bots target popular chat networks to distribute spam and malware. In this paper, we first conduct a series of measurements on a large0 commercial chat network. Our measurements capture a total of 16 different types of chat bots ranging from simple to advanced. Moreover, we observe that human behavior is more complex than bot behavior. Based on the measurement study, we propose a classification system to accurately distinguish chat bots from human users. The proposed classification system consists of two components: 1) an entropy-based classifier; and 2) a Bayesian-based classifier. The two classifiers complement each other in chat bot detection. The entropy-based classifier is more accurate to detect unknown chat bots, whereas the Bayesian-based classifier is faster to detect known chat bots. Our experimental evaluation shows that the proposed classification system is highly effective in differentiating bots from humans.