A cache based stack folding technique for high performance Java processors
JTRES '06 Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems
A predecoding technique for ILP exploitation in Java processors
Journal of Systems Architecture: the EUROMICRO Journal
Exploiting an abstract-machine-based framework in the design of a Java ILP processor
Journal of Systems Architecture: the EUROMICRO Journal
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable due to dependencies involving stack operands. The sequentialization due to stack dependency can be overcome by identifying bytecode-traces, which are sequences of bytecode instructions that when executed leave the operand-stack in the same state as it was at the beginning of the sequence. Instructions from different bytecode-traces have no stack-operand dependency and hence can be executed in parallel on multiple operand-stacks. We propose a simultaneous multi-trace instruction-issue (SMTI) architecture for a processor that can issue instructions from multiple bytecode-traces to exploit Java-ILP. Extraction of bytecode-traces and nested bytecode folding are done in software during the method verification stage. SMTI combined with nested folding resulted in an average ILP speedup of 54% over the base in-order single-issue Java processor, when experimented withSPECjvm98, Scimark and Linpack benchmarks.