Reducing off-chip memory access via stream-conscious tiling on multimedia applications

Authors:
Chunhui Zhang;Fadi Kurdahi
Affiliations:
Department of EECS, University of California, Irvine, CA;Department of EECS, University of California, Irvine, CA
Venue:
International Journal of Parallel Programming
Year:
2007

Citing 22
Cited 0

Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Interprocedural compilation of Fortran D for MIMD distributed-memory machines

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Bus-invert coding for low-power I/O

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data prefetching and multilevel blocking for linear algebra operations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimal Data Scheduling for Uniform Multidimensional Applications

IEEE Transactions on Computers
Power exploration for data dominated video applications

ISLPED '96 Proceedings of the 1996 international symposium on Low power electronics and design
The Organization of Computations for Uniform Recurrence Equations

Journal of the ACM (JACM)
Loop tiling for parallelism

Loop tiling for parallelism
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Image and Video Compression Standards: Algorithms and Architectures

Image and Video Compression Standards: Algorithms and Architectures
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Hierarchical tiling for improved superscalar performance

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Complexity of Multi-dimensional Loop Alignment

STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
Tiling with limited resources

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications

EDTC '97 Proceedings of the 1997 European conference on Design and Test
Two-dimensional orthogonal tiling: from theory to practice

HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis
Fast, predictable and low energy memory references through architecture-aware compilation

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Data space-oriented tiling for enhancing locality

ACM Transactions on Embedded Computing Systems (TECS)
A compiler-based approach for dynamically managing scratch-pad memories in embedded systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

The iteration space of a loop nest is the set of all loop iterations bounded by the loop limits. Tiling the iteration space can effectively exploit the available parallelism, which is essential to multiprocessor compiling and pipelined architecture design. Another improvement brought by tiling is the better data locality that can dramatically reduce memory access and, consequently, the relevant memory access energy consumptions. However, previous studies on tiling were based on the data dependence, thus arrays without dependencies such as input arrays (data streams) were not considered. In this paper, we extend the tiling exploration to also accommodate those dependence-free arrays, and propose a stream-conscious tiling scheme for off-chip memory access optimization. We show that input arrays are as important, if not more, as the arrays with data dependencies when the focus is on memory access optimization instead of parallelism extraction. Our approach is verified on TI's low power C55X DSP with popular multimedia applications, exhibiting off-chip memory access reduction by 67% on average over the traditional iteration space tiling.