Derive: a tool that automatically reverse-engineers instruction encodings

Authors:
Dawson R. Engler;Wilson C. Hsieh
Affiliations:
Computer Systems Laboratory, Stanford University, Stanford, CA;Department of Computer Science, University of Utah, Salt Lake City, UT
Venue:
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Year:
2000

Citing 18
Cited 6

Superoptimizer: a look at the smallest program

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
An approach to genuine dynamic linking

Software—Practice & Experience
Efficient software-based fault isolation

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimizing dynamically-dispatched calls with run-time type feedback

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Rewriting executable files to measure program behavior

Software—Practice & Experience
The design and evolution of C++

The design and evolution of C++
The mythical man-month (anniversary ed.)

The mythical man-month (anniversary ed.)
Optimizing ML with run-time code generation

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
VCODE: a retargetable, extensible, very fast dynamic code generation system

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A general approach for run-time specialization and its application to C

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
DPF: fast, flexible message demultiplexing using dynamic code generation

Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Automatic checking of instruction specifications

ICSE '97 Proceedings of the 19th international conference on Software engineering
Specifying representations of machine instructions

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reverse interpretation + mutation analysis = automatic retargeting

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Annotation-directed run-time specialization in C

PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
C and tcc: a language and compiler for dynamic code generation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient implementation of the smalltalk-80 system

POPL '84 Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages

Automatic derivation of compiler machine descriptions

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reverse-Engineering Instruction Encodings

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
A brief history of just-in-time

ACM Computing Surveys (CSUR)
Automatic instruction scheduler retargeting by reverse-engineering

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
N-version disassembly: differential testing of x86 disassemblers

Proceedings of the 19th international symposium on Software testing and analysis
Logarithmic-Time FPGA Bitstream Analysis: A Step Towards JIT Hardware Compilation

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error-prone. Users must typically specify thousands of details of instruction layout, such as opcode and field locations values, legal operands, and jump offset encodings. We have built a tool called DERIVE that extracts these details from existing software: the system assembler. Users need only provide the assembly syntax for the instructions for which they want encodings. DERIVE automatically reverse-engineers instruction encoding knowledge from the assembler by feeding it permutations of instructions and doing equation solving on the output.DERIVE is robust and general. It derives instruction encodings for SPARC, MIPS, Alpha, PowerPC, ARM, and x86. In the last case, it handles variable-sized instructions, large instructions, instruction encodings determined by operand size, and other CISC features. DERIVE is also remarkably simple: it is a factor of 6 smaller than equivalent, more traditional systems. Finally, its declarative specifications eliminate the mis-specification errors that plague previous approaches, such as illegal registers used as operands or incorrect field offsets and sizes. This paper discusses our current DERIVE prototype, explains how it computes instruction encodings, and also discusses the more general implications of the ability to extract functionality from installed software.