Theory of linear and integer programming
Theory of linear and integer programming
Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Non-singular data transformations: definition, validity and applications
ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Transformations for imperfectly nested loops
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Advanced Data Layout Optimization for Multimedia Applications
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
IEEE Transactions on Computers
Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Spatial locality exploitation for runtime reordering of JPEG2000 wavelet data layouts
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.00 |
Data transformation is one of the key optimizations in maximizing cache locality. Traditional data transformation strategies employ linear data layouts, e. g., row-major or column-major, for multidimensional arrays. Although a linear layout matches the linear memory space well in most cases, it can only optimize for self-spatial locality for individual references. In this work, we propose a novel data layout transformation framework that is able to determine a tiled layout for each array in an application program. Tiled layout can exploit the group-spatial locality among different references and improve cache line utilization. In our strategy, the data elements accessed by different references in one loop iteration are placed into a tile and fetched into the same cache line at runtime. This helps minimizing conflict misses in caches. We evaluated our data layout transformation framework using 30 benchmarks on a commercial multicore machine. The experimental results show that our approach outperforms state-of-the-art data transformation strategies and works well with large core counts.