Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Theory of linear and integer programming
Theory of linear and integer programming
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ICS '96 Proceedings of the 10th international conference on Supercomputing
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Parameterized polyhedra and their vertices
International Journal of Parallel Programming
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
A matrix-based approach to global locality optimization
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Handling Memory Cache Policy with Integer Points Counting
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Programming skills for a changing world: back to the basics
Journal of Computing Sciences in Colleges
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
New Complexity Results on Array Contraction and Related Problems
Journal of VLSI Signal Processing Systems
Load elimination for low-power embedded processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Obtaining Affine Transformations to Improve Locality of Loop Nests
Programming and Computing Software
Memory optimization by counting points in integer transformations of parametric polytopes
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Computation of storage requirements for multi-dimensional signal processing applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Improving data locality by chunking
CC'03 Proceedings of the 12th international conference on Compiler construction
Experiences with enumeration of integer projections of parametric polytopes
CC'05 Proceedings of the 14th international conference on Compiler Construction
Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly nested. New data layouts are propagated through the connected references and through the loop nests as constraints for optimizing the next connected reference in the same nest or in the other ones. Unlike many existing methods, special attention is paid to TLB (Translation Lookaside Buffer) effectiveness since TLB misses can take from tens to hundreds of processor cycles. Our approach only considers active data, that is, array elements that are actually accessed by a loop, in order to prevent useless memory loads and take advantage of storage compression and temporal locality. Moreover, the same data transformation is not necessarily applied to a whole array. Depending on the referenced data subsets, the transformation can result in different data layouts for a same array. This can significantly improve the performance since a priori incompatible references can be simultaneously optimized. Finally, the process does not only consider the innermost loop level but all levels. Hence, large strides when control returns to the enclosing loop are avoided in several cases, and better optimization is provided in the case of a small index range of the innermost loop.