Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Reconfigurable computing: a survey of systems and software
ACM Computing Surveys (CSUR)
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Chimaera reconfigurable functional unit
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation
FCCM '05 Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Proceedings of the 41st annual Design Automation Conference
Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Dynamically Adapted Low Power ASIPs
ARC '09 Proceedings of the 5th International Workshop on Reconfigurable Computing: Architectures, Tools and Applications
Binary acceleration using coarse-grained reconfigurable architecture
ACM SIGARCH Computer Architecture News
CReAMS: an embedded multiprocessor platform
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Towards an adaptable multiple-ISA reconfigurable processor
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Boosting single thread performance in mobile processors via reconfigurable acceleration
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Mixing static and dynamic strategies for high performance and low area reconfigurable systems
International Journal of High Performance Systems Architecture
Towards a multiple-ISA embedded system
Journal of Systems Architecture: the EUROMICRO Journal
Architecture for transparent binary acceleration of loops with memory accesses
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
Proceedings of the Conference on Design, Automation and Test in Europe
Partial online-synthesis for mixed-grained reconfigurable architectures
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Embedded systems are becoming increasingly complex. Besides the additional processing capabilities, they are characterized by high diversity of computational models coexisting in a single device. Although reconfigurable architectures have already shown to be a potential solution for such systems, they just present significant speedups of very specific dataflow oriented kernels. Furthermore, reconfigurable fabric is still withheld by the need of special tools and compilers, clearly not sustaining backward software compatibility. In this paper, we propose a new technique to optimize both dataflow and control-flow oriented code in a totally transparent process, without the need of any modification in the source or binary codes. For that, we have developed a Binary Translation algorithm implemented in hardware, which works in parallel to a MIPS processor. The proposed mechanism is responsible for transforming sequences of instructions at runtime to be executed on a dynamic coarse-grain reconfigurable array, supporting speculative execution. Executing the MIBench suite, we show performance improvements of up to 2.5 times, while reducing 1.7 times the required energy, using trivial hardware resources.