Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving functional density using run-time circuit reconfiguration
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
FPGA and CPLD Architectures: A Tutorial
IEEE Design & Test
The Systolic Ring: A Dynamically Reconfigurable Architecture for Embedded Systems
FPL '01 Proceedings of the 11th International Conference on Field-Programmable Logic and Applications
Uses and abuses of Amdahl's law
Journal of Computing Sciences in Colleges
A Lightweight Approach for Embedded Reconfiguration of FPGAs
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Lava and JBits: From HDL to Bitstream in Seconds
FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
An overview of reconfigurable hardware in embedded systems
EURASIP Journal on Embedded Systems
Hi-index | 0.00 |
When designing a SoC, matching the required performance both in terms of processing power and power consumption tends to become more and more challenging. Moreover, since the range of targeted applications for every single product is growing rapidly, employing reconfigurable accelerators makes more and more sense to this purpose. Coarse grain reconfigurable architectures bring an alternative providing interesting performance / flexibility trade-offs over traditional approaches. This paper presents an original method allowing to efficiently exploit dynamical parallelism at both loop-level and task-level, which remains rarely used. This method called DHM (Dynamic Hardware Multiplexing) is based upon the use of a hardwired controller dedicated to run-time task scheduling and automatic loop unrolling. This paper shows that significant performance improvements can be achieved through combining both intra and inter-task parallelism. Principles and validations are exposed through a case study on a coarse grain reconfigurable architecture.