Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
An algorithm for drawing general undirected graphs
Information Processing Letters
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Drawing graphs nicely using simulated annealing
ACM Transactions on Graphics (TOG)
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Affinity-based cluster assignment for unrolled loops
ICS '02 Proceedings of the 16th international conference on Supercomputing
The Garp Architecture and C Compiler
Computer
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
The MorphoSys Parallel Reconfigurable System
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Mapping applications to the RaPiD configurable architecture
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling for highly customized datapaths to increase hardware reusability
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
CGRA express: accelerating execution using dynamic operation fusion
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Operation and data mapping for CGRAs with multi-bank memory
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Memory-Aware application mapping on coarse-grained reconfigurable arrays
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
International Journal of Organizational and Collective Intelligence
Hi-index | 0.00 |
Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost and energy efficiency. CGRAs consist of an array of function units and register files generally organized as a two dimensional grid. The most difficult challenge with deploying CGRAs is compiler scheduling technology that can map software implementations of compute intensive loops onto the array. Traditional schedulers are not suitable because they do not take into account the explicit routing of operand values that is necessary. In essence, the problem of binding operations to time slots and resources is extended to also include explicit routing of operands from producers to consumers. To tackle this problem, this paper introduces a software pipelining technique for mapping loop bodies onto CGRAs, referred to as modulo graph embedding. We leverage graph embedding from graph theory, which is used to draw graphs onto a target space. The loop body is essentially drawn onto the CGRA mesh, subject to modulo resource usage constraints. Modulo graph embedding is effective because it can take into account the communication structure of the loop body during mapping. On average, a compute utilization of 56-68% is achieved for a set of loop kernels across three 4x4 CGRA designs.