Context Threading: A Flexible and Efficient Dispatch Technique for Virtual Machine Interpreters

Authors:
Marc Berndl;Benjamin Vitale;Mathew Zaleski;Angela Demke Brown
Affiliations:
University of Toronto;University of Toronto;University of Toronto;University of Toronto
Venue:
Proceedings of the international symposium on Code generation and optimization
Year:
2005

Citing 9
Cited 16

The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Optimizing direct threaded code by selective inlining

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Vmgen: a generator of efficient virtual machine interpreters

Software—Practice & Experience
Optimizing indirect branch prediction accuracy in virtual machine interpreters

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Soot - a Java bytecode optimization framework

CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
Dynamic native optimization of interpreters

Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators
Overview of the IBM Java just-in-time compiler

IBM Systems Journal
Catenation and specialization for Tcl virtual machine performance

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators

Virtual machine showdown: stack versus registers

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Mixed mode execution with context threading

CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
YETI: a graduallY extensible trace interpreter

Proceedings of the 3rd international conference on Virtual execution environments
Optimizing indirect branch prediction accuracy in virtual machine interpreters

ACM Transactions on Programming Languages and Systems (TOPLAS)
Virtual machine showdown: Stack versus registers

ACM Transactions on Architecture and Code Optimization (TACO)
Analyzing the performance of code-copying virtual machines

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Trace-based just-in-time type specialization for dynamic languages

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Code size and performance optimization for mobile JavaScript just-in-time compiler

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Factor: a dynamic stack-based programming language

Proceedings of the 6th symposium on Dynamic languages
Design of a real-time optimized emulation method

Proceedings of the Conference on Design, Automation and Test in Europe
Formally efficient program instrumentation

RV'10 Proceedings of the First international conference on Runtime verification
TypeCastor: demystify dynamic typing of JavaScript applications

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Selective just-in-time compilation for client-side mobile javascript engine

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Efficient interpreter optimizations for the JVM

Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
Efficient hosted interpreters on the JVM

ACM Transactions on Architecture and Code Optimization (TACO)
Deoptimization for dynamic language JITs on typed, stack-based virtual machines

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.01

Visualization

Abstract

Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV, our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.