Structured dataflow analysis for arrays and its use in an optimizing complier
Software—Practice & Experience
Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Interprocedural array region analyses
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Experience with efficient array data flow analysis for array privatization
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Automatic storage management for parallel programs
Parallel Computing - Special issues on languages and compilers for parallel computers
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
A matrix-based approach to global locality optimization
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Proceedings of the 14th international conference on Supercomputing
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Compiler optimizations for scalable parallel systems
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Precise Data Locality Optimization of Nested Loops
The Journal of Supercomputing
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
A Comparison of Compiler Tiling Algorithms
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
A sample-based cache mapping scheme
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Incremental hierarchical memory size estimation for steering of loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hi-index | 14.98 |
Array contraction is a program transformation which reduces array size while preserving the correct output. In this paper, we present an aggressive array-contraction technique and study its impact on memory system performance. This technique, called controlled SFC, combines loop shifting and controlled loop fusion to maximize opportunities for array contraction within a given loop nesting. A controlled fusion scheme is used to prevent overfusing loops and to avoid excessive pressure on the cache and the registers. Reducing the array size increases data reuse because of the increased average number of memory operations on the same memory addresses. Furthermore, if the data size of a loop nest fits in the cache after array contraction, then repeated references to the same variable in the loop nest will generate cache hits, assuming set conflicts are eliminated successfully.