Allowing for ILP in an embedded Java processor

Authors:
Ramesh Radhakrishnan;Deependra Talla;Lizy Kurian John
Affiliations:
Laboratory for Computer Architecture, Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas;Laboratory for Computer Architecture, Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas;Laboratory for Computer Architecture, Electrical and Computer Engineering Department, The University of Texas at Austin, Austin, Texas
Venue:
Proceedings of the 27th annual international symposium on Computer architecture
Year:
2000

Citing 14
Cited 11

HPSm, a high performance restricted data flow architecture having minimal functionality

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Critical issues regarding HPS, a high performance microarchitecture

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Run-time generation of HPS microinstructions from a VAX instruction stream

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
A fill-unit approach to multiple instruction issue

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
The structure and performance of interpreters

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
PicoJava: A Direct Execution Engine For Java Bytecode

Computer
picoJava-I: The Java Virtual Machine in Hardware

IEEE Micro
Compiling Java Just in Time

IEEE Micro
Object-Oriented Architectural Support for a Java Processor

ECCOP '98 Proceedings of the 12th European Conference on Object-Oriented Programming
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Characterization of Java Applications at Bytecode and Ultra-SPARC Machine Code Levels

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design

Adapting Tomasulo's algorithm for bytecode folding based Java processors

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Parallel Queue Processor Architecture Based on Produced Order Computation Model

The Journal of Supercomputing
Instruction folding in a hardware-translation based java virtual machine

Proceedings of the 3rd conference on Computing frontiers
A predecoding technique for ILP exploitation in Java processors

Journal of Systems Architecture: the EUROMICRO Journal
Exploiting an abstract-machine-based framework in the design of a Java ILP processor

Journal of Systems Architecture: the EUROMICRO Journal
An accelerator design for speedup of Java execution in consumer mobile devices

Computers and Electrical Engineering
On the design of a register queue based processor architecture (FaRM-rq)

ISPA'03 Proceedings of the 2003 international conference on Parallel and distributed processing and applications
Application requirements and efficiency of embedded Java bytecode multi-cores

Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems
Investigating hardware micro-instruction folding in a Java embedded processor

Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems
Exploiting dataflow to extract java instruction level parallelism on a tag-based multi-issue semi in-order (TMSI) processor

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
On the design of a dual-execution modes processor: architecture and preliminary evaluation

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Java processors are ideal for embedded and network computing applications such as Internet TV's, set-top boxes, smart phones, and other consumer electronics applications. In this paper, we investigate cost-effective microarchitectural techniques to exploit parallelism in Java bytecode streams. Firstly, we propose the use of a fill unit that stores decoded bytecodes into a decoded bytecode cache. This mechanism improves the fetch and decode bandwidth of Java processors by 2 to 3 times. These additional hardware units can also be used to perform optimizations such as instruction folding. This is particularly significant because experiments with the Verilog model of Sun Microsystems pico Java-II core demonstrates that instruction folding lies in the critical path. Moving folding logic from the critical path of the processor to the fill unit allows to improve the clock frequency by 25%. Out-of-order ILP exploitation is not investigated due to the prohibitive cost, but in-order dual-issue with a 64-entry decoded bytecode cache is seen to result in 10% to 14% improvement in execution cycles. Another contribution of the paper is a stack disambiguation technique that allows elimination of false dependencies between different types of stack accesses. Stack disambiguation further exposes parallelism and a dual in-order issue microengine with a 64-entry bytecode cache yields an additional 10% reduction in cycles, leading to an aggregate reduction of 17% to 24% in execution cycles.