Learning to analyze binary computer code

Authors:
Nathan Rosenblum;Xiaojin Zhu;Barton Miller;Karen Hunt
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison;Computer Sciences Department, University of Wisconsin-Madison;National Security Agency
Venue:
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Year:
2008

Citing 9
Cited 7

A revolution: belief propagation in graphs with cycles

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
SETI@HOME—massively distributed computing for SETI

Computing in Science and Engineering
UQBT: Adaptable Binary Translation at Low Cost

Computer
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Extracting safe and precise control flow from binaries

RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
Obfuscation of executable code to improve resistance to static disassembly

Proceedings of the 10th ACM conference on Computer and communications security
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Static disassembly of obfuscated binaries

SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Trust region Newton methods for large-scale logistic regression

Proceedings of the 24th international conference on Machine learning

Binary analysis for measurement and attribution of program performance

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Extracting compiler provenance from program binaries

Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Hybrid analysis and control of malware

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Recovering the toolchain provenance of binary code

Proceedings of the 2011 International Symposium on Software Testing and Analysis
Labeling library functions in stripped binaries

Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools
Compiler help for binary manipulation tools

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Binary-code obfuscations in prevalent packer tools

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel application of structured classification: identifying function entry points (FEPs, the starting byte of each function) in program binaries. Such identification is the crucial first step in analyzing many malicious, commercial and legacy software, which lack full symbol information that specifies FEPs. Existing pattern-matching FEP detection techniques are insufficient due to variable instruction sequences introduced by compiler and link-time optimizations. We formulate the FEP identification problem as structured classification using Conditional Random Fields. Our Conditional Random Fields incorporate both idiom features to represent the sequence of instructions surrounding FEPs, and control flow structure features to represent the interaction among FEPs. These features allow us to jointly label all FEPs in the binary. We perform feature selection and present an approximate inference method for massive program binaries. We evaluate our models on a large set of real-world test binaries, showing that our models dramatically outperform two existing, standard disassemblers.