Vector Computer Memory Bank Contention
IEEE Transactions on Computers
Performance of a shared memory system for vector multiprocessors
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Some results in memory conflict analysis
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Accurate modelling of interconnection networks in vector supercomputers
ICS '91 Proceedings of the 5th international conference on Supercomputing
Performance evaluation and prediction for parallel algorithms on the BBN GP1000
ICS '90 Proceedings of the 4th international conference on Supercomputing
Access conflicts in multiprocessor memories queueing models and simulation studies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Synchronized access to streams in SIMD vector multiprocessors
ICS '94 Proceedings of the 8th international conference on Supercomputing
Vector multiprocessors with arbitrated memory access
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Performance of the Cedar Multistage Switching Network
IEEE Transactions on Parallel and Distributed Systems
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Increasing the effective bandwidth of complex memory systems in multivector processors
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
The performance of the cedar multistage switching network
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
We propose a set of three memory performance measures directed at vector multiprocessors. One is the port reservation time which is closely related to the commonly-used memory bandwidth measure. The second is the vector fill time and is the latency through the memory system for an entire vector operation. The third is the slowest element time, which is the highest effective latency of all the elements of a vector. The three measures are sufficent to characterize the memory system's influence on the processor's usage of memory ports, functional units, and vector registers--the three main resources that determine vector performance.Simulation results for a next-generation-class vector multiprocessor are given to illustrate typical values for the measures and their inter-relationships. These results display a type of bimodal performance behavior where performance is better for both high and low vectorization levels than it is for moderate vectorization levels. The results are also used with a simple code sequence to illustrate the effect of memory system delays on chained and non-chained performance. These results suggest that chaining may be more efficient if longer vector lengths are used.