Analyzing block locality in Morton-order and Morton-hybrid matrices

Authors:
K. Patrick Lorton;David S. Wise
Affiliations:
Indiana University, Bloomington, IN;Indiana University, Bloomington, IN
Venue:
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Year:
2006

Citing 15
Cited 2

More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The design and analysis of spatial data structures

The design and analysis of spatial data structures
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Finding neighbors of equal size in linear quadtrees and octrees in constant time

CVGIP: Image Understanding
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
An effective way to represent quadtrees

Communications of the ACM
Recursive Array Layouts and Fast Matrix Multiplication

IEEE Transactions on Parallel and Distributed Systems
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Tiling, Block Data Layout, and Memory Hierarchy Performance

IEEE Transactions on Parallel and Distributed Systems
Optimizing Graph Algorithms for Improved Cache Performance

IEEE Transactions on Parallel and Distributed Systems
The Opie compiler from row-major source to Morton-ordered matrices

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
The Hierarchically Tiled Arrays programming approach

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Fast additions on masked integers

ACM SIGPLAN Notices
A paradigm for parallel matrix algorithms: scalable cholesky

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms

Proceedings of the 2006 workshop on Memory system performance and correctness
Representation-transparent matrix algorithms with scalable performance

Proceedings of the 21st annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the architectures of computers change, introducing more caches onto multicore chips, even more locality becomes necessary. With the bandwidth between caches and RAM now even more valuable, additional locality from new matrix representations will be important to keep multiple processors busy. The default storage representations of both C and FORTRAN, row- and column-major respectively, have fundamental deficiencies with many matrix computations. By switching the storage representation from cartesian to block indices, one is able to take better advantage of cache locality at all levels from LI to paging. This paper only changes storage representation from row-major to Morton-hybrid, and applies it to matrix multiplication. Its purpose is to show that, even with only traditional iterative algorithms, simply changing storage representation offers significant speedups.