Theory of linear and integer programming
Theory of linear and integer programming
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Combining optimizations in automated low power design
Proceedings of the Conference on Design, Automation and Test in Europe
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Customizable Domain-Specific Computing
IEEE Design & Test
High-Level Synthesis for FPGAs: From Prototyping to Deployment
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
Memory partitioning for multidimensional arrays in high-level synthesis
Proceedings of the 50th Annual Design Automation Conference
Transformations for throughput optimization in high-level synthesis (abstract only)
Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Hi-index | 0.00 |
The significant development of high-level synthesis tools has greatly facilitated FPGAs as general computing platforms. During the parallelism optimization for the data path, memory becomes a crucial bottleneck that impedes performance enhancement. Simultaneous data access is highly restricted by the data mapping strategy and memory port constraint. Memory partitioning can efficiently map data elements in the same logical array onto multiple physical banks so that the accesses to the array are parallelized. Previous methods for memory partitioning mainly focused on cyclic partitioning for single-port memory. In this work we propose a generalized memory-partitioning framework to provide high data throughput of on-chip memories. We generalize cyclic partitioning into block-cyclic partitioning for a larger design space exploration. We build the conflict detection algorithm on polytope emptiness testing, and use integer points counting in polytopes for intra-bank offset generation. Memory partitioning for multi-port memory is supported in this framework. Experimental results demonstrate that compared to the state-of-art partitioning algorithm, our proposed algorithm can reduce the number of block RAM by 19.58%, slice by 20.26% and DSP by 50%.