Use of parallel level 3 BLAS in LU factorization on three vector multiprocessors the ALLIANT FX/80, the CRAY-2, and the IBM 3090 VF

Authors:
M. J. Daydé;I. S. Duff
Affiliations:
CERFACS, 42 Av. G. Coriolis, 31047 Toulouse Cedex, France and ENSEEIHT-IRIT, 2 rue Camichel 31071 Toulouse Cedex, France;CERFACS, 42 Av. G. Coriolis, 31047 Toulouse Cedex, France and CSS Division, Harwell Laboratory, OXON OX11 0RA, England
Venue:
ICS '90 Proceedings of the 4th international conference on Supercomputing
Year:
1990

Citing 7
Cited 4

The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory

SIAM Journal on Scientific and Statistical Computing
The WY representation for products of householder matrices

SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Portable and efficient factorization algorithms on the IBM 3090/VF

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]

ACM Transactions on Mathematical Software (TOMS)

A new approach for automatic parallelization of blocked linear Algebra computations

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Threshold pivoting for dense LU factorization on distributed memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The role of APL and J in high-performance computation

APL '93 Proceedings of the international conference on APL
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The derived parallel algorithm allows the curve interpolation through n points to be solved in &Ogr;(log n) time using &THgr;n/log n) processors and allows the surface interpolation through m x n points to be solved in &Ogr;(log m log n) time using &THgr; (mn/(log m log n)) processors. Both interpolation algorithms are cost-optimal for their respective problems. Besides, the surface fitting problem can be even faster solved in &Ogr;(log m + log n) time if &THgr;(mn) processors are used in the network.