A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Language support for Morton-order matrices
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The combinatorics of cache misses during matrix multiplication
Journal of Computer and System Sciences - Special issue on Internet algorithms
Enhancing Spatial Locality via Data Layout Optimizations
Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Efficient Interprocedural Data Placement Optimisation in a Parallel Library
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Evaluating ISA support and hardware support for recursive data layouts
HiPC'07 Proceedings of the 14th international conference on High performance computing
Hi-index | 0.00 |
Two-dimensional arrays are generally arranged in memory in row-major order or column-major order. Sophisticated programmers, or occasionally sophisticated compilers, match the loop structure to the language's storage layout in order to maximise spatial locality. Unsophisticated programmers do not, and the performance loss is often dramatic -- up to a factor of 20. With knowledge of how the array will be used, it is often possible to choose between the two layouts in order to maximise spatial locality. In this paper we study the Morton storage layout, which has substantial spatial locality whether traversed in row-major or column-major order. We present results from a suite of simple application kernels which show that, on the AMD Athlon and Pentium III, for arrays larger than 256 脳 256, Morton array layout, even implemented with a lookup table with no compiler support, is always within 61% of both row-major and column-major -- and is sometimes faster.