A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
High-level address optimization and synthesis techniques for data-transfer-intensive applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ACM Computing Surveys (CSUR)
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
IEEE Transactions on Computers
Mapping a Single Assignment Programming Language to Reconfigurable Systems
The Journal of Supercomputing
Vi iMproved
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
Optimized Generation of Data-Path from C Codes for FPGAs
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Expression Synthesis in Process Networks generated by LAURA
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Improving the memory behavior of vertical filtering in the discrete wavelet transform
Proceedings of the 3rd conference on Computing frontiers
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations
Transactions on High-Performance Embedded Architectures and Compilers I
Strength reduction of integer division and modulo operations
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Scalable, Wavelet-Based Video: From Server to Hardware-Accelerated Client
IEEE Transactions on Multimedia
Hi-index | 0.00 |
The high performance potential of an FPGA is not fully exploited if a design suffers a memory bottleneck. Therefore, a memory hierarchy is needed to reuse data in on-chip buffer memories and minimize the number of accesses to off-chip memory. Buffer memories not only hide the external memory latency, but can also be used to remap data and augment the on-chip bandwidth through parallel access of multiple buffers. This paper discusses the differences and similarities of memory hierarchies on processor- and on FPGA-based systems and presents a step-by-step methodology to construct a memory hierarchy on an FPGA.