Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

Authors:
Hyunchul Park;Kevin Fan;Manjunath Kudlur;Scott Mahlke
Affiliations:
University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI;University of Michigan, Ann Arbor, MI
Venue:
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2006

Citing 18
Cited 16

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
An algorithm for drawing general undirected graphs

Information Processing Letters
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Drawing graphs nicely using simulated annealing

ACM Transactions on Graphics (TOG)
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
The Garp Architecture and C Compiler

Computer
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
The MorphoSys Parallel Reconfigurable System

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Mapping applications to the RaPiD configurable architecture

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Exploiting Loop-Level Parallelism on Coarse-Grained Reconfigurable Architectures Using Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A grid layout algorithm for automatic drawing of biochemical networks

Bioinformatics

Application driven embedded system design: a face recognition case study

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
SPKM: a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Operation and data mapping for CGRAs with multi-bank memory

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
A CAD framework for Malibu: an FPGA with time-multiplexed coarse-grained elements

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Memory-Aware application mapping on coarse-grained reconfigurable arrays

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Analysis of Inner-Loop Mapping onto Coarse-Grained Reconfigurable Architectures Using Hybrid Particle Swarm Optimization

International Journal of Organizational and Collective Intelligence
Ingredients of adaptability: a survey of reconfigurable processors

VLSI Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost and energy efficiency. CGRAs consist of an array of function units and register files generally organized as a two dimensional grid. The most difficult challenge with deploying CGRAs is compiler scheduling technology that can map software implementations of compute intensive loops onto the array. Traditional schedulers are not suitable because they do not take into account the explicit routing of operand values that is necessary. In essence, the problem of binding operations to time slots and resources is extended to also include explicit routing of operands from producers to consumers. To tackle this problem, this paper introduces a software pipelining technique for mapping loop bodies onto CGRAs, referred to as modulo graph embedding. We leverage graph embedding from graph theory, which is used to draw graphs onto a target space. The loop body is essentially drawn onto the CGRA mesh, subject to modulo resource usage constraints. Modulo graph embedding is effective because it can take into account the communication structure of the loop body during mapping. On average, a compute utilization of 56-68% is achieved for a set of loop kernels across three 4x4 CGRA designs.