Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Integration, the VLSI Journal
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
IWIA '04 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems
Hierarchical Partitioning for Piecewise Linear Algorithms
PARELEC '06 Proceedings of the international symposium on Parallel Computing in Electrical Engineering
An experimental evaluation of data dependence analysis techniques
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
This paper presents an automatic method to generate a Parallel PU (Processing Unit) for an algorithm with non-affine array references. The PU processes data according to a userdefined functionality and access data through an optimized memory access controller. The parallelism is achieved jointly through data and computation tiling. A generated TTC (Tile Transfer Controller) ensures the distribution of data tiles from external to local memories. For a given application and a user defined level of parallelism, a set of possible data partitioning is explored and the solutions with the minimal internal memory and the best temporal performances are chosen. In this work, the automatic method is used as a front-end of the High-Level Synthesis. For each chosen solution a whole synthesizable C-model including the TTC is generated.