Elastic CGRAs

Authors:
Yuanjie Huang;Paolo Ienne;Olivier Temam;Yunji Chen;Chengyong Wu
Affiliations:
ICT, CAS, Beijing, China;EPFL, Lausanne, Switzerland;INRIA, Saclay, France;ICT, CAS, Beijing, China;ICT, CAS, Beijing, China
Venue:
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Year:
2013

Citing 19
Cited 1

HSRA: high-speed, hierarchical synchronous reconfigurable array

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
VPR: A new packing, placement and routing tool for FPGA research

FPL '97 Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications
Synchronous Interlocked Pipelines

ASYNC '02 Proceedings of the 8th International Symposium on Asynchronus Circuits and Systems
Architectures and algorithms for field-programmable gate arrays with embedded memory

Architectures and algorithms for field-programmable gate arrays with embedded memory
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Synthesis of synchronous elastic architectures

Proceedings of the 43rd annual Design Automation Conference
Tartan: evaluating spatial computation for whole program execution

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Outer loop pipelining for application specific datapaths in FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Adaptive Latency Insensitive Protocols and Elastic Circuits with Early Evaluation: A Comparative Analysis

Electronic Notes in Theoretical Computer Science (ENTCS)
Application Experiments: MPPA and FPGA

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Designing Modular Hardware Accelerators in C with ROCCC 2.0

FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Telescopic units: a new paradigm for performance optimization of VLSI designs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Theory of latency-insensitive design

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Bitwidth cognizant architecture synthesis of custom hardware accelerators

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A Distributed Controller for Managing Speculative Functional Units in High Level Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantifying the cost and benefit of latency insensitive communication on FPGAs

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (CGRAs) promise the performance of massively spatial computing while offering interesting trade-offs of flexibility versus energy efficiency. Yet, configuring and scheduling execution for CGRAs generally runs into the classic difficulties that have hampered Very-Long Instruction Word (VLIW) architectures: efficient schedules are difficult to generate, especially for applications with complex control flow and data structures, and they are inherently static - thus, in adapted to variable-latency components (such as the read ports of caches). Over the years, VLIWs have been relegated to important but specific application domains where such issues are more under the control of the designers; similarly, statically-scheduled CGRAs may prove inadequate for future general-purpose computing systems. In this paper, we introduce Elastic CGRAs, the superscalar processors of computing fabrics: no complex schedule needs to be computed at configuration time, and the operations execute dynamically in the CGRA when data are ready, thus exploiting the data parallelism that an application offers. We designed, down to a manufacturable layout, a simple CGRA where we demonstrated and optimized our elastic control circuitry. We also built a complete compilation toolchain that transforms arbitrary C code in a configuration for the array. The area overhead (26.2%), critical path overhead (8.2%) and energy overhead (53.6%) of Elastic CGRAs over non-elastic CGRAs are significantly lower than the overhead of superscalar processors over VLIWs, while providing the same benefits. At such moderate costs, elasticity may prove to be one of the key enablers to make the adoption of CGRAs widespread.