Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
A datapath synthesis system for the reconfigurable datapath architecture
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Formalized methodology for data reuse exploration for low-power hierarchical memory mappings
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Computers
Adapting software pipelining for reconfigurable computing
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
RaPiD - Reconfigurable Pipelined Datapath
FPL '96 Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications
FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
A systolic array optimizing compiler
A systolic array optimizing compiler
Automatic compilation to a coarse-grained reconfigurable system-opn-chip
ACM Transactions on Embedded Computing Systems (TECS)
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Operation and data mapping for CGRAs with multi-bank memory
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Memory access optimization in compilation for coarse-grained reconfigurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Memory-Aware application mapping on coarse-grained reconfigurable arrays
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
UNTANGLED: A Game Environment for Discovery of Creative Mapping Strategies
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Design of the coarse-grained reconfigurable architecture DART with on-line error detection
Microprocessors & Microsystems
Hi-index | 0.00 |
Coarse grain reconfigurable array architectures have become increasingly popular due to their flexibility, scalability and performance. However, the mapping of programs on these architectures is characterized by huge complexity. This work presents a new mapping methodology for effectively mapping applications on coarse grained reconfigurable arrays. The core of this methodology comprises of the scheduling and register allocation phases performed, for the first time in the case of CGRAs, in a single step. Additionally, modulo scheduling with backtracking capability is incorporated in this scheme. The main contribution of this work includes a novel technique for minimizing the memory bandwidth bottleneck, a new priority scheme and a new set of heuristics which target on the maximization of the Instruction Level Parallelism by efficiently managing the architecture's resources. The overall approach is retargetable with respect to a parametric architecture template modelling a large number of architecture alternatives and it has been automated with a prototype tool which permits experimental exploration. The experimental results showed that the achieved performance figures are very close to the most effective ones derived from the theoretical study on the architecture's resources and the applications requirements. Moreover, the application of the bandwidth optimization technique lead to a 20-130% increase on operation parallelism. Finally, the experiments quantified the benefit from applying the new priority scheme and heuristics.