Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
Communications of the ACM
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Optimizing indirect branch prediction accuracy in virtual machine interpreters
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Feedback-Directed Switch-Case Statement Optimization
ICPPW '05 Proceedings of the 2005 International Conference on Parallel Processing Workshops
Optimizing indirect branch prediction accuracy in virtual machine interpreters
ACM Transactions on Programming Languages and Systems (TOPLAS)
Encyclopedia of Algorithms
Producing wrong data without doing anything obviously wrong!
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Raced profiles: efficient selection of competing compiler optimizations
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Code arrangement of embedded java virtual machine for NAND flash memory
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Interpreter instruction scheduling
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
When the greedy algorithm fails
Discrete Optimization
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Hi-index | 0.03 |
Virtual machines (VMs) are commonly used to execute programs written in languages such as Java, Python and Lua. VMs are typically implemented using an interpreter, a JIT compiler, or some combination of the two. A long-standing question in the design of VM interpreters is whether it is worthwhile to reorder the cases in the main interpreter loop to improve code locality. We investigate this phenomenon using an iterative, feedback-directed approach. We show that the ordering of the cases in the interpreter loop has a significant impact on performance on recent processors. Using hardware performance counters, we demonstrate that the performance improvement is primarily the result of indirect branch prediction, not instruction cache locality. We propose a number of strategies to achieve better orderings, and evaluate these strategies in the Python and Lua virtual machine interpreters. We show speedups of up to 40%.