The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory
SIAM Journal on Scientific and Statistical Computing
The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Portable and efficient factorization algorithms on the IBM 3090/VF
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Algorithm 539: Basic Linear Algebra Subprograms for Fortran Usage [F1]
ACM Transactions on Mathematical Software (TOMS)
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Threshold pivoting for dense LU factorization on distributed memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The role of APL and J in high-performance computation
APL '93 Proceedings of the international conference on APL
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
We show how to transform the B-spline curve and surface fitting problems into suffix computations of continued fractions. Then a parallel substitution scheme is introduced to compute the suffix values on a newly proposed mesh-of-unshuffle network. The derived parallel algorithm allows the curve interpolation through n points to be solved in &Ogr;(log n) time using &THgr;n/log n) processors and allows the surface interpolation through m x n points to be solved in &Ogr;(log m log n) time using &THgr; (mn/(log m log n)) processors. Both interpolation algorithms are cost-optimal for their respective problems. Besides, the surface fitting problem can be even faster solved in &Ogr;(log m + log n) time if &THgr;(mn) processors are used in the network.