Is Morton Layout Competitive for Large Two-Dimensional Arrays?

Authors:
Jeyarajan Thiyagalingam;Paul H. J. Kelly
Affiliations:
-;-
Venue:
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Year:
2002

Citing 7
Cited 2

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonlinear array layouts for hierarchical memory systems

ICS '99 Proceedings of the 13th international conference on Supercomputing
Language support for Morton-order matrices

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
The combinatorics of cache misses during matrix multiplication

Journal of Computer and System Sciences - Special issue on Internet algorithms
Enhancing Spatial Locality via Data Layout Optimizations

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
Efficient Interprocedural Data Placement Optimisation in a Parallel Library

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers

DSiMCluster: A Simulation Model for Efficient Memory Analysis Experiments of DSM Clusters

Simulation
Evaluating ISA support and hardware support for recursive data layouts

HiPC'07 Proceedings of the 14th international conference on High performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two-dimensional arrays are generally arranged in memory in row-major order or column-major order. Sophisticated programmers, or occasionally sophisticated compilers, match the loop structure to the language's storage layout in order to maximise spatial locality. Unsophisticated programmers do not, and the performance loss is often dramatic -- up to a factor of 20. With knowledge of how the array will be used, it is often possible to choose between the two layouts in order to maximise spatial locality. In this paper we study the Morton storage layout, which has substantial spatial locality whether traversed in row-major or column-major order. We present results from a suite of simple application kernels which show that, on the AMD Athlon and Pentium III, for arrays larger than 256 脳 256, Morton array layout, even implemented with a lookup table with no compiler support, is always within 61% of both row-major and column-major -- and is sometimes faster.