Automatic protocol reverse-engineering: Message format extraction and field semantics inference

Authors:
Juan Caballero;Dawn Song
Affiliations:
IMDEA Software Institute, Madrid, Spain;University of California, Berkeley, CA, USA
Venue:
Computer Networks: The International Journal of Computer and Telecommunications Networking
Year:
2013

Citing 23
Cited 2

Aggregate structure identification and its application to program analysis

Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Bro: a system for detecting network intruders in real-time

Computer Networks: The International Journal of Computer and Telecommunications Networking
Secure program execution via dynamic information flow tracking

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Vigilante: end-to-end containment of internet worms

Proceedings of the twentieth ACM symposium on Operating systems principles
ScriptGen: an automated script generation tool for honeyd

ACSAC '05 Proceedings of the 21st Annual Computer Security Applications Conference
Extracting Output Formats from Executables

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Semi-automated discovery of application session structure

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
binpac: a yacc for writing application protocol parsers

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Unexpected means of protocol inference

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Replayer: automatic protocol replay by binary analysis

Proceedings of the 13th ACM conference on Computer and communications security
Understanding data lifetime via whole system simulation

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Polyglot: automatic extraction of protocol message format using dynamic binary analysis

Proceedings of the 14th ACM conference on Computer and communications security
Discoverer: automatic protocol reverse engineering from network traces

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Deriving input syntactic structure from execution

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Tupni: automatic reverse engineering of input formats

Proceedings of the 15th ACM conference on Computer and communications security
Studying spamming botnets using Botlab

NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Loop-extended symbolic execution on binary programs

Proceedings of the eighteenth international symposium on Software testing and analysis
Prospex: Protocol Specification Extraction

SP '09 Proceedings of the 2009 30th IEEE Symposium on Security and Privacy
Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering

Proceedings of the 16th ACM conference on Computer and communications security
ReFormat: automatic reverse engineering of encrypted messages

ESORICS'09 Proceedings of the 14th European conference on Research in computer security
Automatic handling of protocol dependencies and reaction to 0-day attacks with scriptgen based honeypots

RAID'06 Proceedings of the 9th international conference on Recent Advances in Intrusion Detection
Grammar and model extraction for security applications using dynamic program binary analysis

Grammar and model extraction for security applications using dynamic program binary analysis

Editorial: Editorial for Computer Networks special issue on ''Botnet Activity: Analysis, Detection and Shutdown''

Computer Networks: The International Journal of Computer and Telecommunications Networking
Finding trojan message vulnerabilities in distributed systems

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the command-and-control (C&C) protocol used by a botnet is crucial for anticipating its repertoire of nefarious activity. However, the C&C protocols of botnets, similar to many other application layer protocols, are undocumented. Automatic protocol reverse-engineering techniques enable understanding undocumented protocols and are important for many security applications, including the analysis and defense against botnets. For example, they enable active botnet infiltration, where a security analyst rewrites messages sent and received by a bot in order to contain malicious activity and to provide the botmaster with an illusion of successful and unhampered operation. In this work, we propose a novel approach to automatic protocol reverse engineering based on dynamic program binary analysis. Compared to previous work that examines the network traffic, we leverage the availability of a program that implements the protocol. Our approach extracts more accurate and complete protocol information and enables the analysis of encrypted protocols. Our automatic protocol reverse-engineering techniques extract the message format and field semantics of protocol messages sent and received by an application that implements an unknown protocol specification. We implement our techniques into a tool called Dispatcher and use it to analyze the previously undocumented C&C protocol of MegaD, a spam botnet that at its peak produced one third of the spam on the Internet.