The structure and performance of interpreters
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Optimizing direct threaded code by selective inlining
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Vmgen: a generator of efficient virtual machine interpreters
Software—Practice & Experience
Optimizing indirect branch prediction accuracy in virtual machine interpreters
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Soot - a Java bytecode optimization framework
CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
Dynamic native optimization of interpreters
Proceedings of the 2003 workshop on Interpreters, virtual machines and emulators
Overview of the IBM Java just-in-time compiler
IBM Systems Journal
Catenation and specialization for Tcl virtual machine performance
Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Virtual machine showdown: stack versus registers
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Mixed mode execution with context threading
CASCON '05 Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research
YETI: a graduallY extensible trace interpreter
Proceedings of the 3rd international conference on Virtual execution environments
Optimizing indirect branch prediction accuracy in virtual machine interpreters
ACM Transactions on Programming Languages and Systems (TOPLAS)
Virtual machine showdown: Stack versus registers
ACM Transactions on Architecture and Code Optimization (TACO)
Analyzing the performance of code-copying virtual machines
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Trace-based just-in-time type specialization for dynamic languages
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Code size and performance optimization for mobile JavaScript just-in-time compiler
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Factor: a dynamic stack-based programming language
Proceedings of the 6th symposium on Dynamic languages
Design of a real-time optimized emulation method
Proceedings of the Conference on Design, Automation and Test in Europe
Formally efficient program instrumentation
RV'10 Proceedings of the First international conference on Runtime verification
TypeCastor: demystify dynamic typing of JavaScript applications
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Selective just-in-time compilation for client-side mobile javascript engine
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Efficient interpreter optimizations for the JVM
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
Efficient hosted interpreters on the JVM
ACM Transactions on Architecture and Code Optimization (TACO)
Deoptimization for dynamic language JITs on typed, stack-based virtual machines
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.01 |
Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV, our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.