Architecture Scalability of Parallel Vector Computers with a Shared Memory

Authors:
Eskil Dekker
Affiliations:
Delft Univ. of Technology, Delft, The Netherlands
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 16
Cited 0

Peak vs. Sustained Performance in Highly Concurrent Vector Machines

Computer
Methods for performance evaluation of algorithms and computers

Computers in Physics
Scalability of parallel machines

Communications of the ACM
Analysis of scalability of parallel algorithms and architectures: a survey

ICS '91 Proceedings of the 5th international conference on Supercomputing
What is scalability?

ACM SIGARCH Computer Architecture News
On Self-Routing in Benes and Shuffle-Exchange Networks

IEEE Transactions on Computers
Ultracomputers: a teraflop before its time

Communications of the ACM
On a Class of Rearrangeable Networks

IEEE Transactions on Computers
On Multistage Interconnection Networks with Small Clock Cycles

IEEE Transactions on Parallel and Distributed Systems
Horizons of parallel computation

Journal of Parallel and Distributed Computing
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
Performance Analysis of Finite Buffered Multistage Interconnection Networks

IEEE Transactions on Computers
Finite Buffer Analysis of Multistage Interconnection Networks

IEEE Transactions on Computers
Symmetric Crossbar Arbiters for VLSI Communication Switches

IEEE Transactions on Parallel and Distributed Systems
The LINPACK Benchmark: An Explanation

Proceedings of the 1st International Conference on Supercomputing
Performance of Various Computers Using Standard Linear Equations Software

Performance of Various Computers Using Standard Linear Equations Software

Quantified Score

Hi-index	14.98

Visualization

Abstract

Based on a model of a parallel vector computer with a shared memory, its scalability properties are derived. The processor-memory interconnection network is assumed to be composed of crossbar switches of size b脳b. This paper analyzes sustainable peak performance under optimal conditions, i.e., no memory bank conflicts, sufficient processor-memory bank pathways, and no interconnection network conflicts. It will be shown that, with fully vectorizable algorithms and no communication overhead, the sustainable peak performance does not scale up linearly with the number of processors p. If the interconnection network is unbuffered, the number of memory banks must increase at least with O(p logbp) to sustain peak performance. If the network is buffered, this bottleneck can be alleviated; however, the half performance vector length still increases with O(logbp). The paper confirms the validity of the model by examining the performance behavior of the LINPACK benchmark.