Implemention of a divide and conquer cyclic reduction algorithm on the FPS T-20 hypercube

Authors:
C. L. Cox
Affiliations:
Dept. of Mathematical Sciences, Clemson University, Clemson, SC
Venue:
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Year:
1989

Citing 2
Cited 1

The Solution of Tridiagonal Linear Systems on the CDC STAR 100 Computer

ACM Transactions on Mathematical Software (TOMS)
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms

What have we learnt from using real parallel machines to solve real problems?

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

A simple variant of the odd-even cyclic reduction algorithm for solving tridiagonal linear systems is presented. The target architecture for this scheme is a parallel computer with nodes which are vector processors, such as the Floating Point Systems T-Series hypercube. Of particular interest is the case where the number of equations is much larger than the number of processors. The matrix system is partitioned into local subsystems, with the partitioning governed by a parameter which determines the amount of redundancy in computations. The algorithm proceeds after the distribution of local systems with independent computations, all-to-all broadcast of a small number of equations from each processor, solution of this subsystem, more independent computations, and output of the solution. Some redundancy in calculations between neighboring processors results in minimized communication costs. One feature of this approach is that computations are well balanced, as each processor executes an identical algebraic routine.A brief description of the standard cyclic reduction algorithm is given. Then the divide and conquer strategy is presented along with some estimates of speedup and efficiency. Finally, an Occam program for this algorithm which runs on the FPS T-20 computer is discussed along with experimental results.