Bytecode fetch optimization for a Java interpreter

Authors:
Kazunori Ogata;Hideaki Komatsu;Toshio Nakatani
Affiliations:
IBM Tokyo Research Laboratory, Yamato-shi, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato-shi, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato-shi, Kanagawa, Japan
Venue:
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Year:
2002

Citing 15
Cited 5

Interpretation and instruction path coprocessing

Interpretation and instruction path coprocessing
Stack caching for interpreters

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Design, implementation, and evaluation of optimizations in a just-in-time compiler

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
A code compression system based on pipelined interpreters

Software—Practice & Experience
Indirect threaded code

Communications of the ACM
Threaded code

Communications of the ACM
Java Runtime Systems: Characterization and Architectural Implications

IEEE Transactions on Computers
A dynamic optimization framework for a Java just-in-time compiler

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Java Virtual Machine Specification

Java Virtual Machine Specification
The Java Language Specification

The Java Language Specification
The PowerPC 604 RISC microprocessor

IEEE Micro
Overview of the IBM Java just-in-time compiler

IBM Systems Journal
The java hotspotTM server compiler

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
POWER3: the next generation of PowerPC processors

IBM Journal of Research and Development

Partial redundancy elimination for access expressions by speculative code motion

Software—Practice & Experience
Combining stack caching with dynamic superinstructions

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
Code sharing among states for stack-caching interpreter

Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators
High performance annotation-aware JVM for Java cards

Proceedings of the 5th ACM international conference on Embedded software
Optimization strategies for a java virtual machine interpreter on the cell broadband engine

Proceedings of the 5th conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interpreters play an important role in many languages, and their performance is critical particularly for the popular language Java. The performance of the interpreter is important even for high-performance virtual machines that employ just-in-time compiler technology, because there are advantages in delaying the start of compilation and in reducing the number of the target methods to be compiled. Many techniques have been proposed to improve the performance of various interpreters, but none of them has fully addressed the issues of minimizing redundant memory accesses and the overhead of indirect branches inherent to interpreters running on superscalar processors. These issues are especially serious for Java because each bytecode is typically one or a few bytes long and the execution routine for each bytecode is also short due to the low-level, stack-based semantics of Java bytecode. In this paper, we describe three novel techniques of our Java bytecode interpreter, write-through top-of-stack caching (WT), position-based handler customization (PHC), and position-based speculative decoding (PSD), which ameliorate these problems for the PowerPC processors. We show how each technique contributes to improving the overall performance of the interpreter for major Java benchmark programs on an IBM POWER3 processor. Among three, PHC is the most effective one. We also show that the main source of memory accesses is due to bytecode fetches and that PHC successfully eliminates the majority of them, while it keeps the instruction cache miss ratios small.