REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)

Authors:
Mahdi Hamzeh;Aviral Shrivastava;Sarma Vrudhula
Affiliations:
Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ;Arizona State University, Tempe, AZ
Venue:
Proceedings of the 50th Annual Design Automation Conference
Year:
2013

Citing 19
Cited 0

Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Programmable arithmetic devices for high speed digital signal processing

Programmable arithmetic devices for high speed digital signal processing
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A datapath synthesis system for the reconfigurable datapath architecture

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design and Implementation of the MorphoSys Reconfigurable ComputingProcessor

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Concrete Mathematics: A Foundation for Computer Science

Concrete Mathematics: A Foundation for Computer Science
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
Architecture, Memory and Interface Technology Integration of an Industrial/Academic Configurable System-on-Chip (CSoC)

ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SPR: an architecture-adaptive CGRA mapping tool

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Architecture enhancements for the ADRES coarse-grained reconfigurable array

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse

Proceedings of the 49th Annual Design Automation Conference
EPIMap: using epimorphism to map applications on CGRAs

Proceedings of the 49th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coarse-Grained Reconfigurable Architectures (CGRAs) are an extremely attractive platform when both performance and power efficiency are paramount. Although the power-efficiency of CGRAs can be very high, their performance critically hinges upon the capabilities of the compiler. This is because a CGRA compiler has to perform explicit pipelining, scheduling, placement, and routing of operations. Existing CGRA compilers struggle with two main problems: 1) effectively utilizing the local register files in the PEs, and 2) high compilation times. This paper significantly improves the state-of-the-art in CGRA compilers by first creating a precise and general formulation of the problem of loop mapping on CGRAs, considering the local registers, and from the insights gained from the problem formulation, distilling an efficient and constructive heuristic solution. We show that the mapping problem, once characterized, can be reduced to the problem of finding maximal weighted clique in the product graph of the time-extended CGRA and the data dependence graph of the kernel. The heuristic we've developed results in average of 1.89 X better performance than the state-of-the-art methods when applied to several kernels from multimedia and SPEC2006 benchmarks. A unique feature of our heuristic is that it learns from failed attempts and constructively changes the schedule to achieve better mappings at lower compilation times.