Resource aware mapping on coarse grained reconfigurable arrays

Authors:
Grigorios Dimitroulakos;Stavros Georgiopoulos;Michalis D. Galanis;Costas E. Goutis
Affiliations:
VLSI Design Laboratory, ECE Department, University of Patras, 26500 Patras, Greece;VLSI Design Laboratory, ECE Department, University of Patras, 26500 Patras, Greece;VLSI Design Laboratory, ECE Department, University of Patras, 26500 Patras, Greece;VLSI Design Laboratory, ECE Department, University of Patras, 26500 Patras, Greece
Venue:
Microprocessors & Microsystems
Year:
2009

Citing 27
Cited 5

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
A datapath synthesis system for the reconfigurable datapath architecture

ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications

IEEE Transactions on Computers
Adapting software pipelining for reconfigurable computing

CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits

Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration

Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms

Introduction to Algorithms
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
PipeRench: A Reconfigurable Architecture and Compiler

Computer
Compilation Approach for Coarse-Grained Reconfigurable Architectures

IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
RaPiD - Reconfigurable Pipelined Datapath

FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A systolic array optimizing compiler

A systolic array optimizing compiler
Automatic compilation to a coarse-grained reconfigurable system-opn-chip

ACM Transactions on Embedded Computing Systems (TECS)
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems

Operation and data mapping for CGRAs with multi-bank memory

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Memory access optimization in compilation for coarse-grained reconfigurable architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Memory-Aware application mapping on coarse-grained reconfigurable arrays

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Design of the coarse-grained reconfigurable architecture DART with on-line error detection

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Coarse grain reconfigurable array architectures have become increasingly popular due to their flexibility, scalability and performance. However, the mapping of programs on these architectures is characterized by huge complexity. This work presents a new mapping methodology for effectively mapping applications on coarse grained reconfigurable arrays. The core of this methodology comprises of the scheduling and register allocation phases performed, for the first time in the case of CGRAs, in a single step. Additionally, modulo scheduling with backtracking capability is incorporated in this scheme. The main contribution of this work includes a novel technique for minimizing the memory bandwidth bottleneck, a new priority scheme and a new set of heuristics which target on the maximization of the Instruction Level Parallelism by efficiently managing the architecture's resources. The overall approach is retargetable with respect to a parametric architecture template modelling a large number of architecture alternatives and it has been automated with a prototype tool which permits experimental exploration. The experimental results showed that the achieved performance figures are very close to the most effective ones derived from the theoretical study on the architecture's resources and the applications requirements. Moreover, the application of the bandwidth optimization technique lead to a 20-130% increase on operation parallelism. Finally, the experiments quantified the benefit from applying the new priority scheme and heuristics.