The impact of memory organization on the performance of matrix multiplication

Authors:
J.-Fr. Hake;W. Homberg
Affiliations:
Forschungszentrum Juelich GmbH (KFA), D-5170 Juelich, Fed. Rep. Germany;Forschungszentrum Juelich GmbH (KFA), D-5170 Juelich, Fed. Rep. Germany
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 10
Cited 0

On the effective bandwidth of interleaved memories in vector processor systems

IEEE Transactions on Computers
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Programming in VS Fortran on the IBM 3090 for Maximum Vector Performance

Computer
Program locality of vectorized applications running on the IBM 3090 with vector facility

IBM Systems Journal
Programming style on the IBM 3090 vector facility considering both performance ad flexibility

IBM Systems Journal
Squeezing the most out of an algorithm in CRAY FORTRAN

ACM Transactions on Mathematical Software (TOMS)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Cache Memories

ACM Computing Surveys (CSUR)
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
The impact of memory organization on the performance of matrix calculations

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matrix multiplication may be considered as a model problem for analyzing the performance of more complex algorithms. On CRAY and IBM computer systems, there are library routines which for this task operate at high megaflop rates. Other programs from numerical linear algebra do not always achieve this level of sophistication; e.g. they suffer from performance degradation caused by memory access conflicts. This effect has been studied considering the performance of subroutines for matrix multiplication on CRAY X-MP, CRAY Y-MP, and IBM 3090. Results are analyzed by means of simulation.