Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Moving Java into Mobile Phones
Computer
Hardware-Software Cosynthesis for Digital Systems
IEEE Design & Test
Runtime Reconfiguration Techniques for Efficient General-Purpose Computation
IEEE Design & Test
Dynamic hardware/software partitioning: a first approach
Proceedings of the 40th annual Design Automation Conference
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
CACO-PS: A General Purpose Cycle-Accurate Configurable Power Simulator
SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning
Proceedings of the conference on Design, automation and test in Europe - Volume 1
A VLIW low power Java processor for embedded applications
SBCCI '04 Proceedings of the 17th symposium on Integrated circuits and system design
Proceedings of the 42nd annual Design Automation Conference
Hi-index | 0.00 |
In this paper we present a Binary Translation algorithm to detect, completely at run-time, sequences of instructions to be executed in a reconfigurable array, which in turn is coupled to an embedded Java processor. By translating any sequence of operations into a combinational circuit performing the same computation, one can speed up the system and reduce energy consumption, at the obvious price of extra area. We show what are the costs to implement this translation algorithm in hardware, and what are the performance and energy gains when using such technique. Furthermore, we demonstrate that this translation algorithm is particularly easy to be implemented in a stack machine, because of its particular computational method. Algorithms used in the embedded systems domain were accelerated 4.6 times in the mean, while spending almost 11 times less energy.