Proceedings of the 27th annual international symposium on Computer architecture
Loop Parallelization in the Polytope Model
CONCUR '93 Proceedings of the 4th International Conference on Concurrency Theory
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Estimating influence of data layout optimizations on SDRAM energy consumption
Proceedings of the 2003 international symposium on Low power electronics and design
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Lattice-Based Memory Allocation
IEEE Transactions on Computers
Predator: a predictable SDRAM memory controller
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Automatic On-chip Memory Minimization for Data Reuse
FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
The worst-case execution-time problem—overview of methods and survey of tools
ACM Transactions on Embedded Computing Systems (TECS)
Automatic code generation for distributed memory architectures in the polytope model
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Application specific memory access, reuse and reordering for SDRAM
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Hi-index | 0.00 |
SDRAM memories are a commodity technology which deliver fast, cheap and high capacity external memory in many cost-sensitive embedded applications. When designing with SDRAM memory, the memory bandwidth available is strongly dependent on the sequence of addresses requested. For applications with hard real-time performance requirements, it is prudent to perform at compile time, some form of analysis to guarantee those hard real-time deadlines are met. In general with SDRAM memories, this analysis is difficult, and this leads to conservative implementations. On-chip memory buffers can make possible data reuse and request reordering which together ensure bandwidth on an SDRAM interface is used efficiently. This paper outlines an automated procedure for synthesizing application-specific address generators which exploit data-reuse in on-chip memory and transaction reordering on an external memory interface. We quantify the impact this has on memory bandwidth over a range of representative benchmarks. Across a range of parameterized designs, we observe up to 50x reduction in the quantity of data fetched from external memory. This, combined with reordering of the transactions, allows up to 128x reduction in the memory access time of certain memory-intensive benchmarks implemented in an FPGA. Since the synthesis procedure results in monotonic memory addressing functions, we can extract tight worst-case execution (WCET) bounds that are useful in system analysis. We show that we can extract performance guarantees which are significantly tighter than the absolute worst-case SDRAM performance.