A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Finding neighbors of equal size in linear quadtrees and octrees in constant time
CVGIP: Image Understanding
Matrix computations (3rd ed.)
Recursive Array Layouts and Fast Matrix Multiplication
IEEE Transactions on Parallel and Distributed Systems
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
The history of FORTRAN I, II, and III
ACM SIGPLAN Notices - Special issue: History of programming languages conference
The Opie compiler from row-major source to Morton-ordered matrices
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Special Feature: Epigrams on programming
ACM SIGPLAN Notices
Analyzing block locality in Morton-order and Morton-hybrid matrices
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms
Proceedings of the 2006 workshop on Memory system performance and correctness
Representation-transparent matrix algorithms with scalable performance
Proceedings of the 21st annual international conference on Supercomputing
Analyzing block locality in Morton-order and Morton-hybrid matrices
ACM SIGARCH Computer Architecture News
Optimizing memory access on GPUs using morton order indexing
Proceedings of the 48th Annual Southeast Regional Conference
A new and effective hierarchical overlay structure for Peer-to-Peer networks
Computer Communications
Hi-index | 0.00 |
Suppose the bits of a computer word are partitioned into d disjoint sets, each of which is used to represent one of a d-tuple of cartesian indices into d-dimensional space. Then, regardless of the partition, simple group operations and comparisons can be implemented for each index on a conventional processor in a sequence of two or three register operations.These indexings allow any blocked algorithm from linear algebra to use some non-standard matrix orderings that increase locality and enhance their performance. The underlying implementations were designed for alternating bit postitions to index Morton-ordered matrices, but they apply, as well, to any bit partitioning. A hybrid ordering of the elements of a matrix becomes possible, therefore, with row-/column-major ordering within cache-sized blocks and Morton ordering of those blocks, themselves. So, one can enjoy the temporal locality of nested blocks, as well as compiler optimizations on row- or column-major ordering in base blocks.