Improving Data Locality by Array Contraction

Authors:
Yonghong Song;Rong Xu;Cheng Wang;Zhiyuan Li
Affiliations:
-;-;-;IEEE Computer Society
Venue:
IEEE Transactions on Computers
Year:
2004

Citing 29
Cited 5

Structured dataflow analysis for arrays and its use in an optimizing complier

Software—Practice & Experience
Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Software pipelining

ACM Computing Surveys (CSUR)
Interprocedural array region analyses

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Fusion of Loops for Parallelism and Locality

IEEE Transactions on Parallel and Distributed Systems
Experience with efficient array data flow analysis for array privatization

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Eliminating conflict misses for high performance architectures

ICS '98 Proceedings of the 12th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Automatic storage management for parallel programs

Parallel Computing - Special issues on languages and compilers for parallel computers
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
A matrix-based approach to global locality optimization

Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Fast greedy weighted fusion

Proceedings of the 14th international conference on Supercomputing
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction

ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Blocking and array contraction across arbitrarily nested loops using affine partitioning

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Array dataflow analysis

Compiler optimizations for scalable parallel systems
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Precise Data Locality Optimization of Nested Loops

The Journal of Supercomputing
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors

IEEE Transactions on Computers
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Comparison of Compiler Tiling Algorithms

CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
On the Complexity of Loop Fusion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization

Proceedings of the 12th international symposium on System synthesis

A sample-based cache mapping scheme

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A polynomial-time algorithm for memory space reduction

International Journal of Parallel Programming
Bee+Cl@k: an implementation of lattice-based array contraction in the source-to-source translator rose

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MPSoC memory optimization using program transformation

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Quantified Score

Hi-index	14.98

Visualization

Abstract

Array contraction is a program transformation which reduces array size while preserving the correct output. In this paper, we present an aggressive array-contraction technique and study its impact on memory system performance. This technique, called controlled SFC, combines loop shifting and controlled loop fusion to maximize opportunities for array contraction within a given loop nesting. A controlled fusion scheme is used to prevent overfusing loops and to avoid excessive pressure on the cache and the registers. Reducing the array size increases data reuse because of the increased average number of memory operations on the same memory addresses. Furthermore, if the data size of a loop nest fits in the cache after array contraction, then repeated references to the same variable in the loop nest will generate cache hits, assuming set conflicts are eliminated successfully.