Polyglot: automatic extraction of protocol message format using dynamic binary analysis

Authors:
Juan Caballero;Heng Yin;Zhenkai Liang;Dawn Song
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA & College of William and Mary, Williamsburg, VA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA & UC Berkeley, Berkeley, CA
Venue:
Proceedings of the 14th ACM conference on Computer and communications security
Year:
2007

Citing 20
Cited 49

Secure program execution via dynamic information flow tracking

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Violating Assumptions with Fuzzing

IEEE Security and Privacy
Improving network applications security: a new heuristic to generate stress testing data

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
ACAS: automated construction of application signatures

Proceedings of the 2005 ACM SIGCOMM workshop on Mining network data
Vigilante: end-to-end containment of internet worms

Proceedings of the twentieth ACM symposium on Operating systems principles
ScriptGen: an automated script generation tool for honeyd

ACSAC '05 Proceedings of the 21st Annual Computer Security Applications Conference
The species per path approach to SearchBased test data generation

Proceedings of the 2006 international symposium on Software testing and analysis
Extracting Output Formats from Executables

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Semi-automated discovery of application session structure

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
binpac: a yacc for writing application protocol parsers

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Unexpected means of protocol inference

Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
Replayer: automatic protocol replay by binary analysis

Proceedings of the 13th ACM conference on Computer and communications security
Minos: Architectural support for protecting control data

ACM Transactions on Architecture and Code Optimization (TACO)
Argos: an emulator for fingerprinting zero-day attacks for advertised honeypots with automatic signature generation

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
A first look at modern enterprise traffic

IMC '05 Proceedings of the 5th ACM SIGCOMM conference on Internet Measurement
Understanding data lifetime via whole system simulation

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Dynamic application-layer protocol analysis for network intrusion detection

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Panorama: capturing system-wide information flow for malware detection and analysis

Proceedings of the 14th ACM conference on Computer and communications security
Discoverer: automatic protocol reverse engineering from network traces

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Towards automatic discovery of deviations in binary implementations with applications to error detection and fingerprint generation

SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium

Deriving input syntactic structure from execution

Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Ether: malware analysis via hardware virtualization extensions

Proceedings of the 15th ACM conference on Computer and communications security
Towards automatic reverse engineering of software security configurations

Proceedings of the 15th ACM conference on Computer and communications security
Tupni: automatic reverse engineering of input formats

Proceedings of the 15th ACM conference on Computer and communications security
BitBlaze: A New Approach to Computer Security via Binary Analysis

ICISS '08 Proceedings of the 4th International Conference on Information Systems Security
Taint-based directed whitebox fuzzing

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Loop-extended symbolic execution on binary programs

Proceedings of the eighteenth international symposium on Software testing and analysis
Polymorphing Software by Randomizing Data Structure Layout

DIMVA '09 Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering

Proceedings of the 16th ACM conference on Computer and communications security
Towards Generating High Coverage Vulnerability-Based Signatures with Protocol-Level Constraint-Guided Exploration

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
Automated Behavioral Fingerprinting

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
ReFormat: automatic reverse engineering of encrypted messages

ESORICS'09 Proceedings of the 14th European conference on Research in computer security
Thwarting zero-day polymorphic worms with network-level length-based signature generation

IEEE/ACM Transactions on Networking (TON)
Automatically identifying critical input regions and code in applications

Proceedings of the 19th international symposium on Software testing and analysis
DDE: dynamic data structure excavation

Proceedings of the first ACM asia-pacific workshop on Workshop on systems
Towards automatic inference of task hierarchies in complex systems

HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Inference and analysis of formal models of botnet command and control protocols

Proceedings of the 17th ACM conference on Computer and communications security
Reverse engineering for mobile systems forensics with Ares

Proceedings of the 2010 ACM workshop on Insider threats
Learning automata representation of network protocol by grammar induction

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Efficient file fuzz testing using automated analysis of binary file format

Journal of Systems Architecture: the EUROMICRO Journal
Automatically complementing protocol specifications from network traces

EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
Checksum-Aware Fuzzing Combined with Dynamic Taint Analysis and Symbolic Execution

ACM Transactions on Information and System Security (TISSEC)
Checking conformance of a producer and a consumer

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Inferring protocol state machine from network traces: a probabilistic approach

ACNS'11 Proceedings of the 9th international conference on Applied cryptography and network security
Forensic triage for mobile phones with DEC0DE

SEC'11 Proceedings of the 20th USENIX conference on Security
MACE: model-inference-assisted concolic exploration for protocol and vulnerability discovery

SEC'11 Proceedings of the 20th USENIX conference on Security
Detection and analysis of cryptographic data inside software

ISC'11 Proceedings of the 14th international conference on Information security
V2E: combining hardware virtualization and softwareemulation for transparent and extensible malware analysis

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Using purpose capturing signatures to defeat computer virus mutating

ISPEC'10 Proceedings of the 6th international conference on Information Security Practice and Experience
Automated identification of cryptographic primitives in binary programs

RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
Efficient and stealthy instruction tracing and its applications in automated malware analysis: open problems and challenges

iNetSec'11 Proceedings of the 2011 IFIP WG 11.4 international conference on Open Problems in Network Security
Bridging the interoperability gap: overcoming combined application and middleware heterogeneity

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Challenges in network application identification

LEET'12 Proceedings of the 5th USENIX conference on Large-Scale Exploits and Emergent Threats
Network protocol discovery and analysis via live interaction

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities

Proceedings of the 2012 International Symposium on Software Testing and Analysis
B@bel: leveraging email delivery for spam mitigation

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Learning stateful models for network honeypots

Proceedings of the 5th ACM workshop on Security and artificial intelligence
StegoTorus: a camouflage proxy for the Tor anonymity system

Proceedings of the 2012 ACM conference on Computer and communications security
PeerPress: utilizing enemies' P2P strength against them

Proceedings of the 2012 ACM conference on Computer and communications security
Learning fine-grained structured input for memory corruption detection

ISC'12 Proceedings of the 15th international conference on Information Security
Bridging the interoperability gap: overcoming combined application and middleware heterogeneity

Proceedings of the 12th International Middleware Conference
Intelligent network security assessment with modeling and analysis of attack patterns

Security and Communication Networks
Augmenting vulnerability analysis of binary code

Proceedings of the 28th Annual Computer Security Applications Conference
Towards network containment in malware analysis systems

Proceedings of the 28th Annual Computer Security Applications Conference
Automatic protocol reverse-engineering: Message format extraction and field semantics inference

Computer Networks: The International Journal of Computer and Telecommunications Networking
Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection

ACM Transactions on Information and System Security (TISSEC)
Obfuscation resilient binary code reuse through trace-oriented programming

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Tappan Zee (north) bridge: mining memory accesses for introspection

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
ProVeX: detecting botnets with encrypted command and control channels

DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Protocol reverse engineering, the process of extracting the application-level protocol used by an implementation, without access to the protocol specification, is important for many network security applications. Recent work [17] has proposed protocol reverse engineering by using clustering on network traces. That kind of approach is limited by the lack of semantic information on network traces. In this paper we propose a new approach using program binaries. Our approach, shadowing, uses dynamic analysis and is based on a unique intuition - the way that an implementation of the protocol processes the received application data reveals a wealth of information about the protocol message format. We have implemented our approach in a system called Polyglot and evaluated it extensively using real-world implementations of five different protocols: DNS, HTTP, IRC, Samba and ICQ. We compare our results with the manually crafted message format, included in Wireshark, one of the state-of-the-art protocol analyzers. The differences we find are small and usually due to different implementations handling fields in different ways. Finding such differences between implementations is an added benefit, as they are important for problems such as fingerprint generation, fuzzing, and error detection.