SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks

  • Authors:
  • Ernie Chan;Field G. Van Zee;Paolo Bientinesi;Enrique S. Quintana-Orti;Gregorio Quintana-Orti;Robert van de Geijn

  • Affiliations:
  • The University of Texas at Austin, Austin, TX, USA;The University of Texas at Austin, Austin, TX, USA;Duke University, Durham, NC, USA;Universidad Jaume I, Castellon, Spain;Universidad Jaume I, Castellon, Spain;The University of Texas at Austin, Austin, TX, USA

  • Venue:
  • Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes SuperMatrix, a runtime system that parallelizes matrix operations for SMP and/or multi-core architectures. We use this system to demonstrate how code described at a high level of abstraction can achieve high performance on such architectures while completely hiding the parallelism from the library programmer. The key insight entails viewing matrices hierarchically, consisting of blocks that serve as units of data where operations over those blocks are treated as units of computation. The implementation transparently enqueues the required operations, internally tracking dependencies, and then executes the operations utilizing out-of-order execution techniques inspired by superscalar microarchitectures. This separation of concerns allows library developers to implement algorithms without concerning themselves with the parallelization aspect of the problem. Different heuristics for scheduling operations can be implemented in the runtime system independent of the code that enqueues the operations. Results gathered on a 16 CPU ccNUMA Itanium2 server demonstrate excellent performance.