SIAM Journal on Scientific and Statistical Computing
Numerical computation of internal & external flows: fundamentals of numerical discretization
Numerical computation of internal & external flows: fundamentals of numerical discretization
Optimizing tridiagonal solvers for alternating direction methods on Boolean cube multiprocessors
SIAM Journal on Scientific and Statistical Computing
Alernating-Direction Line-Relaxation Methods on Multicomputers
SIAM Journal on Scientific Computing
Scaling of Beowulf-class distributed systems
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A multi-level parallelization concept for high-fidelity multi-block solvers
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Multiphase Complete Exchange: A Theoretical Analysis
IEEE Transactions on Computers
MULTIPHASE COMPLETE EXCHANGE ON PARAGON, SP2 \& CS-2
MULTIPHASE COMPLETE EXCHANGE ON PARAGON, SP2 \'& CS-2
A parallel compact multi-dimensional numerical algorithm with aeroacoustics applications
A parallel compact multi-dimensional numerical algorithm with aeroacoustics applications
Hi-index | 0.48 |
Gaussian elimination is used for the direct solution of banded linear systems that typically appear in implicit numerical methods for PDEs. Gaussian elimination for narrow-banded systems (also known as the Thomas algorithm (TA)) includes forward and backward recurrences along lines of a numerical grid. Multi-domain decomposition, essential for parallelization of implicit solvers, spans the recurrences across processors in one or more directions. Processor idle time and inter-processor communication time are two interdependent reasons for the poor parallelization efficiency of TAs. In this research an efficient parallel algorithm for 3D directionally split problems is developed. The proposed solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. The proposed algorithm uses a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines. This algorithm has data available for other computational tasks while processors are idle from the TA. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm considerably reduces the communication cost and processor idle time over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.