Elimination of redundant memory traffic in high-level synthesis

Authors:
D. J. Kolson;A. Nicolau;N. Dutt
Affiliations:
Dept. of Inf. & Comput. Sci., California Univ., Irvine, CA;-;-
Venue:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year:
2006

Citing 0
Cited 8

Formalized methodology for data reuse exploration in hierarchical memory mappings

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Systematic data reuse exploration methodology for irregular access patterns

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Static scheduling of multi-domain memories for functional verification

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Search space definition and exploration for nonuniform data reuse opportunities in data-dominant applications

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Copy Elimination for Parallelizing Compilers

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
CuMAPz: a tool to analyze memory access patterns in CUDA

Proceedings of the 48th Design Automation Conference
Branch penalty reduction on IBM cell SPUs via software branch hinting

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Memory performance estimation of CUDA programs

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors

Quantified Score

Hi-index	0.03

Visualization

Abstract

This paper presents a new transformation for the scheduling of memory-access operations in high-level synthesis. This transformation is suited to memory-intensive applications with synthesized designs containing a secondary store accessed by explicit instructions. Such memory-intensive behaviors are commonly observed in video compression, image convolution, hydrodynamics and mechatronics. Our transformation removes load and store instructions which become redundant or unnecessary during the transformation of loops. The advantage of this reduction is the decrease of secondary memory bandwidth demands. This technique is implemented in our Percolation-Based Scheduler which we used to conduct experiments on a suite of memory-intensive benchmarks. Our results demonstrate a significant reduction in the number of memory operations and an increase in performance on these benchmarks