Language support for Morton-order matrices

  • Authors:
  • David S. Wise;Jeremy D. Frens;Yuhong Gu;Gregory A. Alexander

  • Affiliations:
  • Computer Science Dept., Indiana University, Bloomington, IN;Dept. of Computer Science, Calvin College, Grand Rapids, MI and Indiana University;Oracle Corporation, One Oracle Drive, Nashua, NH and Indiana University;Computer Science Dept., Indiana University, Bloomington, IN

  • Venue:
  • PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

The uniform representation of 2-dimensional arrays serially in Morton order (or {\eee} order) supports both their iterative scan with cartesian indices and their divide-and-conquer manipulation as quaternary trees. This data structure is important because it relaxes serious problems of locality and latency, and the tree helps to schedule multi-processing. Results here show how it facilitates algorithms that avoid cache misses and page faults at all levels in hierarchical memory, independently of a specific runtime environment.We have built a rudimentary C-to-C translator that implements matrices in Morton-order from source that presumes a row-major implementation. Early performance from LAPACK's reference implementation of \texttt{dgesv} (linear solver), and all its supporting routines (including \texttt{dgemm} matrix-multiplication) form a successful research demonstration. Its performance predicts improvements from new algebra in back-end optimizers.We also present results from a more stylish \texttt{dgemm} algorithm that takes better advantage of this representation. With only routine back-end optimizations inserted by hand (unfolding the base case and passing arguments in registers), we achieve machine performance exceeding that of the manufacturer-crafted {\tt dgemm} running at 67% of peak flops. And the same code performs similarly on several machines.Together, these results show how existing codes and future block-recursive algorithms can work well together on this matrix representation. Locality is key to future performance, and the new representation has a remarkable impact.