Are Bytecodes an Atavism?

  • Authors:
  • Theo D'Hondt

  • Affiliations:
  • Programming Technology Lab, Vrije Universiteit Brussel, Brussels, Belgium B1050

  • Venue:
  • Self-Sustaining Systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The notion of bytecodes can be traced back to the 60's with BCPL O-codes. These were essentially used to pursue platform independence. Later, with Pascal p-codes and Smalltalk bytecodes the objective shifted to the concept of virtual machines as precursors to dedicated hardware implementations, culminating in Lilith and SOAR. More recently, Java adopted a similar approach, but with the advent of efficient JIT-technology, bytecodes resumed their role as intermediary representation of programs written in some higher level language. It is our conjecture that using bytecodes in this capacity is an atavism, a throwback to times where hardware bytecode machines were the ultimate target. We suggest that the question of an optimal intermediary representation must be raised. In this paper we investigate the exact opposite of the bytecode approach: we define an intermediary notation which is as close as possible to the semantics of the programming language under consideration. It is then a question of applying the correct compiler technology to produce an efficient JIT strategy for generating efficient machine code. A more interesting question addressed here is whether a virtual machine can be built using this strategy that matches a bytecode interpreter in perceived performance, while giving the running program much more control over its execution than is the case in the bytecode approach. We investigate a totally non-compromise approach, where a unified memory architecture is used to host all structures relevant during program execution, including program data structures, program representation, interpreter caches and runtime stacks. We existentially prove that it is possible to build a virtual machine along these lines that can match a bytecode implementation in performance while giving much more "self" control to the running program. Two cases are presented here: the Pico language and virtual machine which were co-designed with the unified memory approach in mind, and a Scheme virtual machine intended to match the performance of PLT-Scheme.