The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures

Authors:
M. Anton Ertl;David Gregg
Affiliations:
-;-
Venue:
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Year:
2001

Citing 6
Cited 2

Optimizing an ANSI C interpreter with superoperators

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Stack caching for interpreters

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Accurate indirect branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
Indirect threaded code

Communications of the ACM
Threaded code

Communications of the ACM

Domain-Specific Language for HW/SW Co-design for FPGAs

DSL '09 Proceedings of the IFIP TC 2 Working Conference on Domain-Specific Languages
Vectorization technology to improve interpreter performance

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Romer et al (ASPLOS 96) examined several interpreters and concluded that they behave much like general purpose integer programs such as gcc. We show that there is an important class of interpreters which behave very differently. Efficient virtual machine interpreters perform a large number of indirect branches (3.2%-13% of all executed instructions in our benchmarks, taking up to 61%-79% of the cycles on a machine with no branch prediction). We evaluate how various branch prediction schemes and methods to reduce the mispredict penalty affect the performance of several virtual machine interpreters. Our results show that for current branch predictors, threaded code interpreters cause fewer mispredictions, and are almost twice as fast as switch based interpreters on modern superscalar architectures.