Optimizing indirect branch prediction accuracy in virtual machine interpreters

Authors:
Kevin Casey;M. Anton Ertl;David Gregg
Affiliations:
Trinity College Dublin, Dublin, Ireland;TU Wien;Trinity College Dublin, Dublin, Ireland
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2007

Citing 25
Cited 5

Text compression

Text compression
Improving semi-static branch prediction by code replication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimizing an ANSI C interpreter with superoperators

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Stack caching for interpreters

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A comparative analysis of schemes for correlated branch prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving the Accuracy of History-Based Branch Prediction

IEEE Transactions on Computers
Optimizing direct threaded code by selective inlining

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Accurate indirect branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
A code compression system based on pipelined interpreters

Software—Practice & Experience
Threaded code

Communications of the ACM
Optimising Bytecode Emulation for Prolog

PPDP '99 Proceedings of the International Conference PPDP'99 on Principles and Practice of Declarative Programming
Multi-stage Cascaded Prediction

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Vmgen: a generator of efficient virtual machine interpreters

Software—Practice & Experience
Optimizing indirect branch prediction accuracy in virtual machine interpreters

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A portable research framework for the execution of java bytecode

A portable research framework for the execution of java bytecode
Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters

Proceedings of the international symposium on Code generation and optimization
Adapting branch-target buffer to improve the target predictability of java code

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)

Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)
SableVM: a research framework for the efficient execution of java bytecode

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Effective inline-threaded interpretation of Java bytecode using preparation sequences

CC'03 Proceedings of the 12th international conference on Compiler construction
Tiger – an interpreter generation tool

CC'05 Proceedings of the 14th international conference on Compiler Construction

Optimization strategies for a java virtual machine interpreter on the cell broadband engine

Proceedings of the 5th conference on Computing frontiers
Intermediate language design of high-level language virtual machines: towards comprehensive concurrency support

Proceedings of the Third Workshop on Virtual Machines and Intermediate Languages
Design of a real-time optimized emulation method

Proceedings of the Conference on Design, Automation and Test in Europe
Optimizing interpreters by tuning opcode orderings on virtual machines for modern architectures: or: how I learned to stop worrying and love hill climbing

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
One VM to rule them all

Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interpreters designed for efficiency execute a huge number of indirect branches and can spend more than half of the execution time in indirect branch mispredictions. Branch target buffers (BTBs) are the most widely available form of indirect branch prediction; however, their prediction accuracy for existing interpreters is only 2%--50%. In this article we investigate two methods for improving the prediction accuracy of BTBs for interpreters: replicating virtual machine (VM) instructions and combining sequences of VM instructions into superinstructions. We investigate static (interpreter build-time) and dynamic (interpreter runtime) variants of these techniques and compare them and several combinations of these techniques. To show their generality, we have implemented these optimizations in VMs for both Java and Forth. These techniques can eliminate nearly all of the dispatch branch mispredictions, and have other benefits, resulting in speedups by a factor of up to 4.55 over efficient threaded-code interpreters, and speedups by a factor of up to 1.34 over techniques relying on dynamic superinstructions alone.