Toward scalable matrix multiply on multithreaded architectures

  • Authors:
  • Bryan Marker;Field G. Van Zee;Kazushige Goto;Gregorio Quintana-Ortí;Robert A. van de Geijn

  • Affiliations:
  • National Instruments;The University of Texas at Austin;The University of Texas at Austin;Universidad Jaume I, Spain;The University of Texas at Austin

  • Venue:
  • Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple one-dimensional data partitioning is suboptimal in the context of dense linear algebra operations and hinders scalability. In addition we advocate the publishing of low-level interfaces to supporting operations, such as the copying of data to contiguous memory, so that library developers may further optimize parallel linear algebra implementations. Data collected on a 16 CPU Itanium2 server supports these observations.