Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing

  • Authors:
  • Jason McCandless;David Gregg

  • Affiliations:
  • Trinity College Dublin, Lero@TCD;Trinity College Dublin, Lero@TCD

  • Venue:
  • Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
  • Year:
  • 2011

Quantified Score

Hi-index 0.03

Visualization

Abstract

Virtual machines (VMs) are commonly used to execute programs written in languages such as Java, Python and Lua. VMs are typically implemented using an interpreter, a JIT compiler, or some combination of the two. A long-standing question in the design of VM interpreters is whether it is worthwhile to reorder the cases in the main interpreter loop to improve code locality. We investigate this phenomenon using an iterative, feedback-directed approach. We show that the ordering of the cases in the interpreter loop has a significant impact on performance on recent processors. Using hardware performance counters, we demonstrate that the performance improvement is primarily the result of indirect branch prediction, not instruction cache locality. We propose a number of strategies to achieve better orderings, and evaluate these strategies in the Python and Lua virtual machine interpreters. We show speedups of up to 40%.