Analyzing block locality in Morton-order and Morton-hybrid matrices

Authors:
K. Patrick Lorton;David S. Wise
Affiliations:
Schrodinger, New York, NY;Indiana University, Bloomington, IN
Venue:
ACM SIGARCH Computer Architecture News
Year:
2007

Citing 17
Cited 3

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The design and analysis of spatial data structures

The design and analysis of spatial data structures
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Finding neighbors of equal size in linear quadtrees and octrees in constant time

CVGIP: Image Understanding
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
An effective way to represent quadtrees

Communications of the ACM
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
The Opie compiler from row-major source to Morton-ordered matrices

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
The Hierarchically Tiled Arrays programming approach

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Fast additions on masked integers

ACM SIGPLAN Notices
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

Proceedings of the 2006 workshop on Memory system performance and correctness
A cache oblivious algorithm for matrix multiplication based on peano's space filling curve

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
A paradigm for parallel matrix algorithms: scalable cholesky

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Optimizing memory access on GPUs using morton order indexing

Proceedings of the 48th Annual Southeast Regional Conference
Two-dimensional cache-oblivious sparse matrix-vector multiplication

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the architectures of computers change, introducing more caches onto multicore chips, even more locality becomes necessary. With the bandwidth between caches and RAM now even more valuable, additional locality from new matrix representations will be important to keep multiple processors busy. The default storage representations of both C and Fortran, row- and column-major respectively, have fundamental deficiencies with many matrix computations. By switching the storage representation from cartesian to block indices, one is able to take better advantage of cache locality at all levels from L1 to paging. This paper only changes storage representation from row-major to Morton-hybrid, and applies it to matrix multiplication. Its purpose is to show that, even with only traditional iterative algorithms, simply changing storage representation offers significant speedups.