Derive: a tool that automatically reverse-engineers instruction encodings

  • Authors:
  • Dawson R. Engler;Wilson C. Hsieh

  • Affiliations:
  • Computer Systems Laboratory, Stanford University, Stanford, CA;Department of Computer Science, University of Utah, Salt Lake City, UT

  • Venue:
  • DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

Many binary tools, such as disassemblers, dynamic code generation systems, and executable code rewriters, need to understand how machine instructions are encoded. Unfortunately, specifying such encodings is tedious and error-prone. Users must typically specify thousands of details of instruction layout, such as opcode and field locations values, legal operands, and jump offset encodings. We have built a tool called DERIVE that extracts these details from existing software: the system assembler. Users need only provide the assembly syntax for the instructions for which they want encodings. DERIVE automatically reverse-engineers instruction encoding knowledge from the assembler by feeding it permutations of instructions and doing equation solving on the output.DERIVE is robust and general. It derives instruction encodings for SPARC, MIPS, Alpha, PowerPC, ARM, and x86. In the last case, it handles variable-sized instructions, large instructions, instruction encodings determined by operand size, and other CISC features. DERIVE is also remarkably simple: it is a factor of 6 smaller than equivalent, more traditional systems. Finally, its declarative specifications eliminate the mis-specification errors that plague previous approaches, such as illegal registers used as operands or incorrect field offsets and sizes. This paper discusses our current DERIVE prototype, explains how it computes instruction encodings, and also discusses the more general implications of the ability to extract functionality from installed software.