BotGraph: large scale spamming botnet detection

Authors:
Yao Zhao;Yinglian Xie;Fang Yu;Qifa Ke;Yuan Yu;Yan Chen;Eliot Gillum
Affiliations:
Northwestern University;Microsoft Research Silicon Valley;Microsoft Research Silicon Valley;Microsoft Research Silicon Valley;Microsoft Research Silicon Valley;Northwestern University;Microsoft Corporation
Venue:
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Year:
2009

Citing 18
Cited 30

Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Sketch-based change detection: methods, evaluation, and applications

Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Understanding the network-level behavior of spammers

Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Random Graph Dynamics (Cambridge Series in Statistical and Probabilistic Mathematics)

Random Graph Dynamics (Cambridge Series in Statistical and Probabilistic Mathematics)
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
How dynamic are IP addresses?

Proceedings of the 2007 conference on Applications, technologies, architectures, and protocols for computer communications
Filtering spam with behavioral blacklisting

Proceedings of the 14th ACM conference on Computer and communications security
A case study of the rustock rootkit and spam bot

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
The anatomy of Clickbot.A

HotBots'07 Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets
Spamscatter: characterizing internet scam hosting infrastructure

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm

LEET'08 Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats
Spamming botnets: signatures and characteristics

Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Spamalytics: an empirical analysis of spam marketing conversion

Proceedings of the 15th ACM conference on Computer and communications security
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

De-anonymizing the internet using unreliable IDs

Proceedings of the ACM SIGCOMM 2009 conference on Data communication
SBotMiner: large scale search bot detection

Proceedings of the third ACM international conference on Web search and data mining
Large-scale bot detection for search engines

Proceedings of the 19th international conference on World wide web
Suppressing bot traffic with accurate human attestation

Proceedings of the first ACM asia-pacific workshop on Workshop on systems
Volley: automated data placement for geo-distributed cloud services

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Detecting algorithmically generated malicious domain names

IMC '10 Proceedings of the 10th ACM SIGCOMM conference on Internet measurement
Social network-based botnet command-and-control: emerging threats and countermeasures

ACNS'10 Proceedings of the 8th international conference on Applied cryptography and network security
The case for in-the-lab botnet experimentation: creating and taking down a 3000-node botnet

Proceedings of the 26th Annual Computer Security Applications Conference
BotGrep: finding P2P bots with structured graph analysis

USENIX Security'10 Proceedings of the 19th USENIX conference on Security
Clustering botnet communication traffic based on n-gram feature selection

Computer Communications
Boosting the scalability of botnet detection using adaptive traffic sampling

Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security
Optimizing data partitioning for data-parallel computing

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
Parallelizing large-scale data processing applications with data skew: a case study in product-offer matching

Proceedings of the second international workshop on MapReduce and its applications
BOTMAGNIFIER: locating spambots on the internet

SEC'11 Proceedings of the 20th USENIX conference on Security
Spam or ham?: characterizing and detecting fraudulent "not spam" reports in web mail systems

Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
Identifying botnets by capturing group activities in DNS traffic

Computer Networks: The International Journal of Computer and Telecommunications Networking
Spam 2.0: the problem ahead

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Auto-learning of SMTP TCP transport-layer features for spam and abusive message detection

LISA'11 Proceedings of the 25th international conference on Large Installation System Administration
Aiding the detection of fake accounts in large scale social online services

NSDI'12 Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation
Populated IP addresses: classification and applications

Proceedings of the 2012 ACM conference on Computer and communications security
Innocent by association: early recognition of legitimate users

Proceedings of the 2012 ACM conference on Computer and communications security
Longtime behavior of harvesting spam bots

Proceedings of the 2012 ACM conference on Internet measurement conference
Detecting algorithmically generated domain-flux attacks with DNS traffic analysis

IEEE/ACM Transactions on Networking (TON)
Botnets: A survey

Computer Networks: The International Journal of Computer and Telecommunications Networking
Peri-Watchdog: Hunting for hidden botnets in the periphery of online social networks

Computer Networks: The International Journal of Computer and Telecommunications Networking
SocialWatch: detection of online service abuse via large-scale social graphs

Proceedings of the 8th ACM SIGSAC symposium on Information, computer and communications security
Community-based features for identifying spammers in online social networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Survey and taxonomy of botnet research through life-cycle

ACM Computing Surveys (CSUR)
UNIK: unsupervised social network spam detection

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Leveraging Social Feedback to Verify Online Identity Claims

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Network security applications often require analyzing huge volumes of data to identify abnormal patterns or activities. The emergence of cloud-computing models opens up new opportunities to address this challenge by leveraging the power of parallel computing. In this paper, we design and implement a novel system called BotGraph to detect a new type of botnet spamming attacks targeting major Web email providers. Bot-Graph uncovers the correlations among botnet activities by constructing large user-user graphs and looking for tightly connected subgraph components. This enables us to identify stealthy botnet users that are hard to detect when viewed in isolation. To deal with the huge data volume, we implement BotGraph as a distributed application on a computer cluster, and explore a number of performance optimization techniques. Applying it to two months of Hotmail log containing over 500 million users, BotGraph successfully identified over 26 million botnet-created user accounts with a low false positive rate. The running time of constructing and analyzing a 220GB Hot-mail log is around 1.5 hours with 240 machines. We believe both our graph-based approach and our implementations are generally applicable to a wide class of security applications for analyzing large datasets.