A cache based stack folding technique for high performance Java processors

Authors:
Isidoros Sideris;George Economakos;Kiamal Pekmestzi
Affiliations:
National Technical University of Athens, Athens, Greece;National Technical University of Athens, Athens, Greece;National Technical University of Athens, Athens, Greece
Venue:
JTRES '06 Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems
Year:
2006

Citing 7
Cited 3

Inside the Java Virtual Machine

Inside the Java Virtual Machine
Java Virtual Machine Specification

Java Virtual Machine Specification
Java Microarchitectures

Java Microarchitectures
Adapting Tomasulo's algorithm for bytecode folding based Java processors

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
PicoJava: A Direct Execution Engine For Java Bytecode

Computer
Exploiting Java-ILP on a Simultaneous Multi-Trace Instruction Issue (SMTI) Processor

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
The JAFARDD processor: a Java architecture based on a Folding Algorithm, with Reservation stations, Dynamic translation, and Dual processing

IEEE Transactions on Consumer Electronics

A predecoding technique for ILP exploitation in Java processors

Journal of Systems Architecture: the EUROMICRO Journal
Exploiting an abstract-machine-based framework in the design of a Java ILP processor

Journal of Systems Architecture: the EUROMICRO Journal
Application requirements and efficiency of embedded Java bytecode multi-cores

Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Java processors have been introduced to offer hardware acceleration for java applications. They execute java bytecodes directly in hardware. However, the stack nature of the java virtual machine instruction set imposes a limitation on the achievable execution performance. If we intend to exploit instruction level parallelism, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper we propose a novel stack folding technique, that uses a predecoded cache to store folded bytecodes, thus enabling reuse. The decoding throughput reaches 4 RISC instructions per cycle. With use of a superscalar backend core, the obtained IPC is approximately 2.08 instructions per cycle (or 3.02 java bytecodes per cycle).