Communication in the KSR1 MPP: performance evaluation using synthetic workload experiments

Authors:
Eric L. Boyd;Edward S. Davidson
Affiliations:
Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, University of Michigan;Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, University of Michigan
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 10
Cited 0

Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1

Computer - Special issue on experimental research in computer architecture
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
Hierarchical performance modeling with MACS: a case study of the convex C-240

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Evaluating the communication performance of MPPs using synthetic sparse matrix multiplication workloads

ICS '93 Proceedings of the 7th international conference on Supercomputing
The KSR1: experimentation and modeling of poststore

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Micro benchmark analysis of the KSR1

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Analysis of Memory Latency Factors and Their Impact on KSR1 Performance

Proceedings of the 8th International Symposium on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed an automatic technique for evaluating the communication performance of massively parallel processors (MPPs). Both communication latency and the amount of communication are investigated as a function of a few basic parameters that characterize an application workload. Parameter values are captured in an automatically generated sparse matrix that multiplies a dense vector in the synthetic workload. Our approach is capable of explaining the degradation of processor performance caused by communication.Using the Kendall Square Research KSR1 MPP as a case study, we demonstrate the effectiveness of the technique through a series of experiments used to characterize the communication performance. We show that read and write communciation latencies vary from 150 to 180 and from 80 to 100 processor cycles, respectively. We show that the read communication latency approximates a linear function of the total system communciation (in subpages), write communication approximates a linear function of the number of distinct shared subpages, and that KSR's automatic update feature is effective in reducing the number of read communications given careful binding of threads to processors.