Automatic On-chip Memory Minimization for Data Reuse

Authors:
Qiang Liu;George A. Constantinides;Konstantinos Masselos;Peter Y. K. Cheung
Affiliations:
Imperial College, UK;Imperial College, UK;University of Peloponnese, Greece;Imperial College, UK
Venue:
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Year:
2007

Citing 0
Cited 9

Combining data reuse with data-level parallelization for FPGA-targeted hardware compilation: a geometric programming framework

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Multi-port abstraction layer for FPGA intensive memory exploitation applications

Journal of Systems Architecture: the EUROMICRO Journal
Automatic memory partitioning: increasing memory parallelism via data structure partitioning

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Application specific memory access, reuse and reordering for SDRAM

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Compiling C-like languages to FPGA hardware: some novel approaches targeting data memory organisation

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
FPGA based efficient on-chip memory for image processing algorithms

Microelectronics Journal
Analytical synthesis of bandwidth-efficient SDRAM address generators

Microprocessors & Microsystems
Using memory profile analysis for automatic synthesis of pointers code

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

FPGA-based computing engines have become a promising option for the implementation of computationally intensive applications due to high flexibility and parallelism. However, one of the main obstacles to overcome when trying to accelerate an application on an FPGA is the bottleneck in off-chip communication, typically to large memories. Often it is known at compile-time that the same data item is accessed many times, and as a result can be loaded once from large off-chip RAM onto scarce on-chip RAM, alleviating this bottleneck. This paper addresses how to automatically derive an address mapping that reduces the size of the required on-chip memory for a given memory access pattern. Experimental results demonstrate that, in practice, our approach reduces on-chip storage requirements to the minimum, corresponding to a reduction in on-chip memory size of up to 40脳 (average 10脳) for some benchmarks compared to a naive approach. At the same time, no clock period penalty or increase in control logic area compared to this approach is observed for these benchmarks.