Optimizing indirect branch prediction accuracy in virtual machine interpreters

Authors:
M. Anton Ertl;David Gregg
Affiliations:
TU Wien;Trinity College, Dublin
Venue:
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Year:
2003

Citing 15
Cited 27

Text compression

Text compression
Improving semi-static branch prediction by code replication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimizing an ANSI C interpreter with superoperators

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Stack caching for interpreters

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A comparative analysis of schemes for correlated branch prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Optimizing direct threaded code by selective inlining

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Accurate indirect branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
A code compression system based on pipelined interpreters

Software—Practice & Experience
Threaded code

Communications of the ACM
Optimising Bytecode Emulation for Prolog

PPDP '99 Proceedings of the International Conference PPDP'99 on Principles and Practice of Declarative Programming
Multi-stage Cascaded Prediction

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Vmgen: a generator of efficient virtual machine interpreters

Software—Practice & Experience

Retargeting JIT Compilers by using C-Compiler Generated Executable Code

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters

Proceedings of the international symposium on Code generation and optimization
Combining stack caching with dynamic superinstructions

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Code sharing among states for stack-caching interpreter

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Interpreting programs in static single assignment form

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Catenation and specialization for Tcl virtual machine performance

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Adapting branch-target buffer to improve the target predictability of java code

ACM Transactions on Architecture and Code Optimization (TACO)
Mixed mode execution with context threading

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
A study of the influence of coverage on the relationship between static and dynamic coupling metrics

Science of Computer Programming - Special issue: Principles and practices of programming in Java (PPPJ 2004)
The case for virtual register machines

Science of Computer Programming - Special issue on advances in interpreters, virtual machines and emulators (IVME'03)
YETI: a graduallY extensible trace interpreter

Proceedings of the 3rd international conference on Virtual execution environments
Optimizing indirect branch prediction accuracy in virtual machine interpreters

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementing fast JVM interpreters using Java itself

Proceedings of the 5th international symposium on Principles and practice of programming in Java
Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Analyzing the performance of code-copying virtual machines

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Minimizing dependencies within generic classes for faster and smaller programs

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Virtual-Machine Abstraction and Optimization Techniques

Electronic Notes in Theoretical Computer Science (ENTCS)
Efficient interpretation using quickening

Proceedings of the 6th symposium on Dynamic languages
Interpreter instruction scheduling

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Frequency estimation of virtual call targets for object-oriented programs

Proceedings of the 25th European conference on Object-oriented programming
Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Branch strategies to optimize decision trees for wide-issue architectures

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Swift: a register-based JIT compiler for embedded JVMs

VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
CVP: an energy-efficient indirect branch prediction with compiler-guided value pattern

Proceedings of the 26th ACM international conference on Supercomputing
Vectorization technology to improve interpreter performance

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient interpreter optimizations for the JVM

Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
Efficient hosted interpreters on the JVM

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers are the best widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%--50%. In this paper we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter run-time) variants of these techniques and compare them and several combinations of these techniques. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 3.17 over efficient threaded-code interpreters, and speedups by a factor of up to 1.3 over techniques relying on superinstructions alone.