HSRA: high-speed, hierarchical synchronous reconfigurable array
FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
IEEE Transactions on Computers
VPR: A new packing, placement and routing tool for FPGA research
FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Synchronous Interlocked Pipelines
ASYNC '02 Proceedings of the 8th International Symposium on Asynchronus Circuits and Systems
Architectures and algorithms for field-programmable gate arrays with embedded memory
Architectures and algorithms for field-programmable gate arrays with embedded memory
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Synthesis of synchronous elastic architectures
Proceedings of the 43rd annual Design Automation Conference
Tartan: evaluating spatial computation for whole program execution
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Outer loop pipelining for application specific datapaths in FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Electronic Notes in Theoretical Computer Science (ENTCS)
Application Experiments: MPPA and FPGA
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Designing Modular Hardware Accelerators in C with ROCCC 2.0
FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Telescopic units: a new paradigm for performance optimization of VLSI designs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Theory of latency-insensitive design
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Bitwidth cognizant architecture synthesis of custom hardware accelerators
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A Distributed Controller for Managing Speculative Functional Units in High Level Synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Quantifying the cost and benefit of latency insensitive communication on FPGAs
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (CGRAs) promise the performance of massively spatial computing while offering interesting trade-offs of flexibility versus energy efficiency. Yet, configuring and scheduling execution for CGRAs generally runs into the classic difficulties that have hampered Very-Long Instruction Word (VLIW) architectures: efficient schedules are difficult to generate, especially for applications with complex control flow and data structures, and they are inherently static - thus, in adapted to variable-latency components (such as the read ports of caches). Over the years, VLIWs have been relegated to important but specific application domains where such issues are more under the control of the designers; similarly, statically-scheduled CGRAs may prove inadequate for future general-purpose computing systems. In this paper, we introduce Elastic CGRAs, the superscalar processors of computing fabrics: no complex schedule needs to be computed at configuration time, and the operations execute dynamically in the CGRA when data are ready, thus exploiting the data parallelism that an application offers. We designed, down to a manufacturable layout, a simple CGRA where we demonstrated and optimized our elastic control circuitry. We also built a complete compilation toolchain that transforms arbitrary C code in a configuration for the array. The area overhead (26.2%), critical path overhead (8.2%) and energy overhead (53.6%) of Elastic CGRAs over non-elastic CGRAs are significantly lower than the overhead of superscalar processors over VLIWs, while providing the same benefits. At such moderate costs, elasticity may prove to be one of the key enablers to make the adoption of CGRAs widespread.