Optimizing SDRAM bandwidth for custom FPGA loop accelerators

Authors:
Samuel Bayliss;George A. Constantinides
Affiliations:
Imperial College, London, United Kingdom;Imperial College, London, United Kingdom
Venue:
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Year:
2012

Citing 15
Cited 4

Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Memory access scheduling

Proceedings of the 27th annual international symposium on Computer architecture
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Array allocation taking into account SDRAM characteristics

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Estimating influence of data layout optimizations on SDRAM energy consumption

Proceedings of the 2003 international symposium on Low power electronics and design
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Lattice-Based Memory Allocation

IEEE Transactions on Computers
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time

Proceedings of the International Symposium on Code Generation and Optimization
Predator: a predictable SDRAM memory controller

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Automatic On-chip Memory Minimization for Data Reuse

FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
CoRAM: an in-fabric memory architecture for FPGA-based computing

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
The polyhedral model is more widely applicable than you think

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

Improving high level synthesis optimization opportunity through polyhedral transformations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Polyhedral-based data reuse optimization for configurable computing

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Improving polyhedral code generation for high-level synthesis

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory bandwidth is critical to achieving high performance in many FPGA applications. The bandwidth of SDRAM memories is, however, highly dependent upon the order in which addresses are presented on the SDRAM interface. We present an automated tool for constructing an application specific on-chip memory address sequencer which presents requests to the external memory with an ordering that optimizes off-chip memory bandwidth for fixed on-chip memory resource. Within a class of algorithms described by affine loop nests, this approach can be shown to reduce both the number of requests made to external memory and the overhead associated with those requests. Data presented shows a trade off between the use of on-chip resources and achievable off-chip memory bandwidth where a range of improvements from 3.6x to 4x gain in efficiency on the external memory interface can be gained at a cost of up to a 1.4x increase in the ALUTs dedicated to address generation circuits in an Altera Stratix III device.