Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing

Authors:
Jason McCandless;David Gregg
Affiliations:
Trinity College Dublin, Lero@TCD;Trinity College Dublin, Lero@TCD
Venue:
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Year:
2011

Citing 14
Cited 1

Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Threaded code

Communications of the ACM
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing indirect branch prediction accuracy in virtual machine interpreters

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Feedback-Directed Switch-Case Statement Optimization

ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Optimizing indirect branch prediction accuracy in virtual machine interpreters

ACM Transactions on Programming Languages and Systems (TOPLAS)
Encyclopedia of Algorithms

Encyclopedia of Algorithms
Producing wrong data without doing anything obviously wrong!

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Raced profiles: efficient selection of competing compiler optimizations

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Code arrangement of embedded java virtual machine for NAND flash memory

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Interpreter instruction scheduling

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
When the greedy algorithm fails

Discrete Optimization

Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers

Quantified Score

Hi-index	0.03

Visualization

Abstract

Virtual machines (VMs) are commonly used to execute programs written in languages such as Java, Python and Lua. VMs are typically implemented using an interpreter, a JIT compiler, or some combination of the two. A long-standing question in the design of VM interpreters is whether it is worthwhile to reorder the cases in the main interpreter loop to improve code locality. We investigate this phenomenon using an iterative, feedback-directed approach. We show that the ordering of the cases in the interpreter loop has a significant impact on performance on recent processors. Using hardware performance counters, we demonstrate that the performance improvement is primarily the result of indirect branch prediction, not instruction cache locality. We propose a number of strategies to achieve better orderings, and evaluate these strategies in the Python and Lua virtual machine interpreters. We show speedups of up to 40%.