IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A dynamically adaptive DSP for heterogeneous reconfigurable platforms
Proceedings of the conference on Design, automation and test in Europe
A dynamically adaptive DSP for heterogeneous reconfigurable platforms
Proceedings of the conference on Design, automation and test in Europe
Recurrence-aware instruction set selection for extensible embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Architectural exploration of the ADRES coarse-grained reconfigurable array
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
MORA: a new coarse-grain reconfigurable array for high throughput multimedia processing
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
A high-performance data path for synthesizing DSP kernels
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Reconfigurable Architectures are good candidates for application accelerators that cannot be set in stone at production time. FPGAs however, often suffer from the area and performance penalty intrinsic in gate-level reconfigurability. To reduce this overhead, coarse-grained reconfigurable arrays (CGRAs) are reconfigurable at the ALU level, but a successful design needs more than computational power---the main bottleneck usually being memory transfers. Just like the integration of hardwired multiplier and memory blocks enabled FPGAs to efficiently implement digital signal processing applications, in this paper we study a customizable architecture template based on heterogeneous processing elements (multipliers, ALU clusters and memories) that provides enough flexibility to realize fast pipelined implementations of various loop kernels on a CGRA.