Parallel ADI solver based on processor scheduling

  • Authors:
  • Alex Povitsky

  • Affiliations:
  • Department of Mechanical Engineering, Concordia University, 1455 de Maisonneuve Blvd, H-549, Montreal, Quebec, Canada H3G 1M8

  • Venue:
  • Applied Mathematics and Computation
  • Year:
  • 2002

Quantified Score

Hi-index 0.48

Visualization

Abstract

Gaussian elimination is used for the direct solution of banded linear systems that typically appear in implicit numerical methods for PDEs. Gaussian elimination for narrow-banded systems (also known as the Thomas algorithm (TA)) includes forward and backward recurrences along lines of a numerical grid. Multi-domain decomposition, essential for parallelization of implicit solvers, spans the recurrences across processors in one or more directions. Processor idle time and inter-processor communication time are two interdependent reasons for the poor parallelization efficiency of TAs. In this research an efficient parallel algorithm for 3D directionally split problems is developed. The proposed solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. The proposed algorithm uses a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines. This algorithm has data available for other computational tasks while processors are idle from the TA. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm considerably reduces the communication cost and processor idle time over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.