Allowing for ILP in an embedded Java processor
Proceedings of the 27th annual international symposium on Computer architecture
Exploiting Java Bytecode Parallelism by Enhanced POC Folding Model (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A software/hardware cooperated stack operations folding model for Java processors
Journal of Systems and Software
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
BlueJEP: a flexible and high-performance Java embedded processor
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
jamuth: an IP processor core for embedded Java real-time systems
JTRES '07 Proceedings of the 5th international workshop on Java technologies for real-time and embedded systems
BluEJAMM: A Bluespec Embedded Java Architecture with Memory Management
SYNASC '07 Proceedings of the Ninth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing
VHDL vs. Bluespec system verilog: a case study on a Java embedded architecture
Proceedings of the 2008 ACM symposium on Applied computing
A Java processor architecture for embedded real-time systems
Journal of Systems Architecture: the EUROMICRO Journal
MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Java bytecode to hardware made easy with bluespec system verilog
Proceedings of the 10th International Workshop on Java Technologies for Real-time and Embedded Systems
Hi-index | 0.00 |
Bytecode folding is an effective technique for speeding up execution in Java virtual machines. This paper investigates a hardware implementation of the aforementioned technique on BlueJEP, a Java embedded processor. Since BlueJEP is a micro-programmed stack machine, we adopt a micro-instruction oriented approach, folding up to four microinstructions (corresponding to up to four bytecodes, on occasion). A variety of processor versions for different subsets of folding patterns are implemented, simulated and synthesized on a Xilinx FPGA. The measurements and results show that, although the number of execution cycles is reduced, the critical path increase leads to a lower performance. Taking into account the device area, we conclude that for our case, adding a second processor may be preferred over hardware folding. In general, we observe that folding efficiency may only be evaluated properly on a real implementation, rather than using theoretical estimates, due to the increased complexity of the hardware.