A comparison of parallel processing on CRAY X-MP AND IBM 3090 VF multiprocessors

Authors:
F. Szelényi;W. E. Nagel
Affiliations:
IBM European Center for Scientific and Engineering Computing, Via Giorgione 159, 00147 Rome, Italy and University of Innsbruck, Austria;Zentralinstitut Jür Angewandte Mathematik (ZAM), Kernforschungsanlage Jülich GmbH, Postfach 1913, 5170 Jüich, Fed. Rep. Germany
Venue:
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Year:
1989

Citing 5
Cited 2

Microtasking on IBM multiprocessors

IBM Journal of Research and Development
The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory

SIAM Journal on Scientific and Statistical Computing
Three-dimensional numerical simulations of the czochralski bulk flow on a CRAY X-MP multiprocessor architecture

ICS '88 Proceedings of the 2nd international conference on Supercomputing
IBM parallel FORTRAN

IBM Systems Journal
Solving Linear Algebraic Equations on an MIMD Computer

Journal of the ACM (JACM)

Visualizing parallel execution of FORTRAN programs

IBM Journal of Research and Development
The impact of memory organization on the performance of matrix calculations

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern supercomputers like CRAY X-MP and IBM 3090 VF achieve their high computing speed by using both vector and parallel hardware. The available multitasking concepts supporting concurrent execution of tasks within a single application have been designed for different purposes: owing to the small dispatching overhead, fine-grain parallelism allows parallelization of small units of computation, usually chunks of a DO loop. Larger units of computation, such as arithmetic intensive subroutines, may be processed independently using coarse-grain parallelism.This paper gives an introduction to the concepts of CRAY macro- and microtasking, and of IBM Multitasking Facility (MTF), the ECSEC microtasking prototype, and Parallel FORTRAN. Basic parallelization using fine-grain as well as coarse-grain techniques have been applied to linear algebra kernels, consisting in matrix multiplication and LU decomposition, and an application program simulating a Czochralski bulk flow describing a crystal growing system. Depending on the problem, it can be shown that a parallel speed up of nearly four (on the CRAY X-MP/416) and nearly six (on the IBM 3090-600E) can be achieved for the implementation of the matrix multiplication. All other kernels and the application program were limited by serialization overheads arising from memory conflicts (bank and section conflicts on CRAY, cache coherence on IBM) and multitasking primitive overheads. However, with a careful implementation a parallel efficiency of more than 0.9 can be obtained on both multiprocessors.