A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling for numa parallel machines
Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Hi-index | 0.00 |
The performance gap between processors and off-chip memories has widened in the last years and is expected to widen even more. Today, it is widely accepted that caches improve significantly the performance of programs, since most of the programs exhibit temporal and/or spatial locality in their memory reference patterns. However, conflict misses can be a major obstacle preventing an application from utilizing the data cache effectively. While array padding can reduce conflict misses it can also increase the data space requirements significantly. In this paper, we present a compiler-based data transformation strategy, called the "generalized data transformations," for reducing inter-array conflict misses in embedded applications. We present the theory behind the generalized data transformations and discuss how they can be integrated with compiler-based loop transformations. Our experimental results demonstrate that the generalized data transformations are very effective in improving data cache behavior of embedded applications.