Implementation of an ADI method on parallel computers
Journal of Scientific Computing
Parallel Computers Two: Architecture, Programming and Algorithms
Parallel Computers Two: Architecture, Programming and Algorithms
Vector performance analysis of the NEC SX-2
ICS '90 Proceedings of the 4th international conference on Supercomputing
A victim cache for vector registers
ICS '97 Proceedings of the 11th international conference on Supercomputing
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
Implications of memory performance for highly efficient supercomputing of scientific applications
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
This paper presents the results of a series of experiments to study the single processor performance of three supercomputers: Cray-2, Cray Y-MP, and ETA10-Q. The main object of this study is to determine the impact of certain architectural features on the performance of modern supercomputers. Features such as clock period, memory links, memory organization, multiple functional units, and chaining are considered here. A simple performance model is used to examine the impact of these features on the performance of a set of basic operations. The results of implementing this set on these machines for three vector lengths and three memory strides are presented and compared. For unit stride operations, the Cray Y-MP outperformed the Cray-2 by as much as three times and the ETA10-Q by as much as four times for these operations. Moreover, unlike the Cray-2 and ETA10-Q, even-numbered strides do not cause a major performance degradation on the Cray Y-MP. Two numerical algorithms are also used for comparison. For three problem sizes of both algorithms, the Cray Y-MP outperformed the Cray-2 by 43% to 68% and the ETA10-Q by four to eight times.