Methods for performance evaluation of algorithms and computers
Computers in Physics
Scalability of parallel machines
Communications of the ACM
Analysis of scalability of parallel algorithms and architectures: a survey
ICS '91 Proceedings of the 5th international conference on Supercomputing
ACM SIGARCH Computer Architecture News
On Self-Routing in Benes and Shuffle-Exchange Networks
IEEE Transactions on Computers
Ultracomputers: a teraflop before its time
Communications of the ACM
On a Class of Rearrangeable Networks
IEEE Transactions on Computers
On Multistage Interconnection Networks with Small Clock Cycles
IEEE Transactions on Parallel and Distributed Systems
Horizons of parallel computation
Journal of Parallel and Distributed Computing
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Advanced Computer Architecture: Parallelism,Scalability,Programmability
Performance Analysis of Finite Buffered Multistage Interconnection Networks
IEEE Transactions on Computers
Finite Buffer Analysis of Multistage Interconnection Networks
IEEE Transactions on Computers
Symmetric Crossbar Arbiters for VLSI Communication Switches
IEEE Transactions on Parallel and Distributed Systems
The LINPACK Benchmark: An Explanation
Proceedings of the 1st International Conference on Supercomputing
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
Hi-index | 14.98 |
Based on a model of a parallel vector computer with a shared memory, its scalability properties are derived. The processor-memory interconnection network is assumed to be composed of crossbar switches of size b脳b. This paper analyzes sustainable peak performance under optimal conditions, i.e., no memory bank conflicts, sufficient processor-memory bank pathways, and no interconnection network conflicts. It will be shown that, with fully vectorizable algorithms and no communication overhead, the sustainable peak performance does not scale up linearly with the number of processors p. If the interconnection network is unbuffered, the number of memory banks must increase at least with O(p logbp) to sustain peak performance. If the network is buffered, this bottleneck can be alleviated; however, the half performance vector length still increases with O(logbp). The paper confirms the validity of the model by examining the performance behavior of the LINPACK benchmark.