Fourier transform and convolution subroutines for the IBM 3090 Vector facility
IBM Journal of Research and Development
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
On the convergence of the cyclic Jacobi method for parallel block orderings
SIAM Journal on Matrix Analysis and Applications
FFTs in external or hierarchical memory
The Journal of Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
LAPACK's user's guide
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
NAS parallel benchmark results
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Fundamental limitations on the use of prefetching and stream buffers for scientific applications
Proceedings of the 2001 ACM symposium on Applied computing
Using Loop-Level Parallelism to Parallelize Vectorizable Programs
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Automatic benchmark generation for cache optimization of matrix operations
ACM-SE 33 Proceedings of the 33rd annual on Southeast regional conference
The memory behavior of cache oblivious stencil computations
The Journal of Supercomputing
Hi-index | 0.03 |