Theory of linear and integer programming
Theory of linear and integer programming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Linear programming and network flows (2nd ed.)
Linear programming and network flows (2nd ed.)
Structured dataflow analysis for arrays and its use in an optimizing complier
Software—Practice & Experience
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving locality and parallelism in nested loops
Improving locality and parallelism in nested loops
Interprocedural array region analyses
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Experience with efficient array data flow analysis for array privatization
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
The implementation and evaluation of fusion and contraction in array languages
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Structure of Computers and Computations
Structure of Computers and Computations
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
On the Complexity of Loop Fusion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Loop Alignment for Memory Accesses Optimization
Proceedings of the 12th international symposium on System synthesis
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Space-time trade-off optimization for a class of electronic structure calculations
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Constraint Network Based Approach to Memory Layout Optimization
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
A polynomial-time algorithm for memory space reduction
International Journal of Parallel Programming
Compiler-directed selective data protection against soft errors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Journal of Systems and Software
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
Explicit Dependence Metadata in an Active Visual Effects Library
Languages and Compilers for Parallel Computing
Trade-offs in loop transformations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
High-performance SIMT code generation in an active visual effects library
Proceedings of the 6th ACM conference on Computing frontiers
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
Locality enhancement by array contraction
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Model-guided empirical tuning of loop fusion
International Journal of High Performance Systems Architecture
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Memory-constrained communication minimization for a class of array computations
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Memory space conscious loop iteration duplication for reliable execution
SAS'05 Proceedings of the 12th international conference on Static Analysis
Iterative collective loop fusion
CC'06 Proceedings of the 15th international conference on Compiler Construction
Hi-index | 0.00 |
In this paper, we propose memory reduction as a new approach to data locality enhancement. Under this approach, we use the compiler to reduce the size of the data repeatedly referenced in a collection of nested loops. Between their reuses, the data will more likely remain in higher-speed memory devices, such as the cache. Specifically, we present an optimal algorithm to combine loop shifting, loop fusion and array contraction to reduce the temporary array storage required to execute a collection of loops. When applied to 20 benchmark programs, our technique reduces the memory requirement, counting both the data and the code, by 51% on average. The transformed programs gain a speedup of 1.40 on average, due to the reduced footprint and, consequently, the improved data locality.