The combination of scheduling, allocation, and mapping in a single algorithm
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Exploiting off-chip memory access modes in high-level synthesis
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Exploiting operation level parallelism through dynamically reconfigurable datapaths
Proceedings of the 39th annual Design Automation Conference
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers
LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Optimizing memory accesses for spatial computation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Automatic Allocation of Arrays to Memories in FPGA Processors with Multiple Memory Banks
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
(R) A Compile Time Partitioning Method for DOALL Loops on Distributed Memory Systems
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators
ASAP '04 Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A compiler approach to managing storage and memory bandwidth in configurable architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Integrated floorplanning, module-selection, and architecture generation for reconfigurable devices
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 0.00 |
Modern, high performance configurable architectures integrate on-chip, distributed block RAM modules to provide ample data storage. Synthesizing applications to these complex systems requires an effective and efficient approach to conduct data partitioning and storage assignment. In this paper, we present a data and iteration space partitioning solution that focuses on minimizing remote memory accesses or, equivalently, maximizing the local computation. Using the same code but different data partitionings, we can achieve faster clock frequencies, without increasing the number of cycles, by simply minimizing global memory accesses. Other optimization techniques like scalar replacement, prefetching and buffer insertion can further minimize remote accesses and lead to average 4.8/spl times/ speedup in overall runtime.