Theory of linear and integer programming
Theory of linear and integer programming
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Compiling for numa parallel machines
Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Proceedings of the 1995 international symposium on Low power design
Low Power Design Symposium
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Parameterized polyhedra and their vertices
International Journal of Parallel Programming
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Nonsingular Data Transformations: Definition, Validity, and Applications
International Journal of Parallel Programming
A matrix-based approach to global locality optimization
Journal of Parallel and Distributed Computing - Special issue on compilation and architectural support for parallel applications
Automatic memory layout transformations to optimize spatial locality in parameterized loop nests
ACM SIGARCH Computer Architecture News - Special issue on interaction between compilers and computer architectures
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Improving Effective Bandwidth through Compiler Enhancement of Global Cache Reuse
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Handling Memory Cache Policy with Integer Points Counting
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Hi-index | 0.00 |
A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. Such objective classically translates into optimizing spatial and temporal data locality especially for nested loops. In this paper, we focus on temporal data locality. Unlike many existing methods, our approach pays special attention to TLB (Translation Lookaside Buffer) effectiveness since a TLB miss can take up to three times more cycles than a cache miss. We propose a generalization of the traditional approach for temporal locality improvement, called data sequence localization, which reduces the number of iterations that separates accesses to a given array element.