A comparison of parallel processing on CRAY X-MP AND IBM 3090 VF multiprocessors

  • Authors:
  • F. Szelényi;W. E. Nagel

  • Affiliations:
  • IBM European Center for Scientific and Engineering Computing, Via Giorgione 159, 00147 Rome, Italy and University of Innsbruck, Austria;Zentralinstitut Jür Angewandte Mathematik (ZAM), Kernforschungsanlage Jülich GmbH, Postfach 1913, 5170 Jüich, Fed. Rep. Germany

  • Venue:
  • ICS '89 Proceedings of the 3rd international conference on Supercomputing
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern supercomputers like CRAY X-MP and IBM 3090 VF achieve their high computing speed by using both vector and parallel hardware. The available multitasking concepts supporting concurrent execution of tasks within a single application have been designed for different purposes: owing to the small dispatching overhead, fine-grain parallelism allows parallelization of small units of computation, usually chunks of a DO loop. Larger units of computation, such as arithmetic intensive subroutines, may be processed independently using coarse-grain parallelism.This paper gives an introduction to the concepts of CRAY macro- and microtasking, and of IBM Multitasking Facility (MTF), the ECSEC microtasking prototype, and Parallel FORTRAN. Basic parallelization using fine-grain as well as coarse-grain techniques have been applied to linear algebra kernels, consisting in matrix multiplication and LU decomposition, and an application program simulating a Czochralski bulk flow describing a crystal growing system. Depending on the problem, it can be shown that a parallel speed up of nearly four (on the CRAY X-MP/416) and nearly six (on the IBM 3090-600E) can be achieved for the implementation of the matrix multiplication. All other kernels and the application program were limited by serialization overheads arising from memory conflicts (bank and section conflicts on CRAY, cache coherence on IBM) and multitasking primitive overheads. However, with a careful implementation a parallel efficiency of more than 0.9 can be obtained on both multiprocessors.