Heterogeneous coarse-grained processing elements: a template architecture for embedded processing acceleration

Authors:
Giovanni Ansaloni;Paolo Bonzini;Laura Pozzi
Affiliations:
University of Lugano (USI), Switzerland;University of Lugano (USI), Switzerland;University of Lugano (USI), Switzerland
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2009

Citing 11
Cited 0

MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Automatic identification of application-specific functional units with architecturally visible storage

Proceedings of the conference on Design, automation and test in Europe: Proceedings
A dynamically adaptive DSP for heterogeneous reconfigurable platforms

Proceedings of the conference on Design, automation and test in Europe
A dynamically adaptive DSP for heterogeneous reconfigurable platforms

Proceedings of the conference on Design, automation and test in Europe
Recurrence-aware instruction set selection for extensible embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and Architectural Exploration of Expression-Grained Reconfigurable Arrays

SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Architectural exploration of the ADRES coarse-grained reconfigurable array

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
MORA: a new coarse-grain reconfigurable array for high throughput multimedia processing

SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
A high-performance data path for synthesizing DSP kernels

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reconfigurable Architectures are good candidates for application accelerators that cannot be set in stone at production time. FPGAs however, often suffer from the area and performance penalty intrinsic in gate-level reconfigurability. To reduce this overhead, coarse-grained reconfigurable arrays (CGRAs) are reconfigurable at the ALU level, but a successful design needs more than computational power---the main bottleneck usually being memory transfers. Just like the integration of hardwired multiplier and memory blocks enabled FPGAs to efficiently implement digital signal processing applications, in this paper we study a customizable architecture template based on heterogeneous processing elements (multipliers, ALU clusters and memories) that provides enough flexibility to realize fast pipelined implementations of various loop kernels on a CGRA.