Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A datapath synthesis system for the reconfigurable datapath architecture
ASP-DAC '95 Proceedings of the 1995 Asia and South Pacific Design Automation Conference
IEEE Transactions on Computers
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Synthesis and Optimization of Digital Circuits
Synthesis and Optimization of Digital Circuits
Compilation Approach for Coarse-Grained Reconfigurable Architectures
IEEE Design & Test
XPP-VC: A C Compiler with Temporal Partitioning for the PACT-XPP Architecture
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Alleviating the Data Memory Bandwidth Bottleneck in Coarse-Grained Reconfigurable Arrays
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Hi-index | 0.00 |
It is widely known that bandwidth limitations degrade parallel systems' performance. This paper presents a mapping methodology for coarse-grain reconfigurable arrays which alleviates the bandwidth bottleneck by exploiting the processing elements interconnection network for transferring values with data reuse opportunities. A novel mapping algorithm is also proposed that uses a resource-aware modulo scheduling technique. From the application of the proposed mapping approach, significant improvements in performance were achieved while we have also quantified these improvements in respect to crucial architecture parameters such as the memory latency and the register file size. For this reason, our methodology targets on a parametric architecture template which can model a large number of existing architectures of this kind.