Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Programmable arithmetic devices for high speed digital signal processing
Programmable arithmetic devices for high speed digital signal processing
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A datapath synthesis system for the reconfigurable datapath architecture
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design and Implementation of the MorphoSys Reconfigurable ComputingProcessor
Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Concrete Mathematics: A Foundation for Computer Science
Concrete Mathematics: A Foundation for Computer Science
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
Network Topology Exploration of Mesh-Based Coarse-Grain Reconfigurable Architectures
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SPR: an architecture-adaptive CGRA mapping tool
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Architecture enhancements for the ADRES coarse-grained reconfigurable array
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Is dark silicon useful?: harnessing the four horsemen of the coming dark silicon apocalypse
Proceedings of the 49th Annual Design Automation Conference
EPIMap: using epimorphism to map applications on CGRAs
Proceedings of the 49th Annual Design Automation Conference
Hi-index | 0.00 |
Coarse-Grained Reconfigurable Architectures (CGRAs) are an extremely attractive platform when both performance and power efficiency are paramount. Although the power-efficiency of CGRAs can be very high, their performance critically hinges upon the capabilities of the compiler. This is because a CGRA compiler has to perform explicit pipelining, scheduling, placement, and routing of operations. Existing CGRA compilers struggle with two main problems: 1) effectively utilizing the local register files in the PEs, and 2) high compilation times. This paper significantly improves the state-of-the-art in CGRA compilers by first creating a precise and general formulation of the problem of loop mapping on CGRAs, considering the local registers, and from the insights gained from the problem formulation, distilling an efficient and constructive heuristic solution. We show that the mapping problem, once characterized, can be reduced to the problem of finding maximal weighted clique in the product graph of the time-extended CGRA and the data dependence graph of the kernel. The heuristic we've developed results in average of 1.89 X better performance than the state-of-the-art methods when applied to several kernels from multimedia and SPEC2006 benchmarks. A unique feature of our heuristic is that it learns from failed attempts and constructively changes the schedule to achieve better mappings at lower compilation times.