A vectorizing Fortran compiler
IBM Journal of Research and Development
Squeezing the most out of an algorithm in CRAY FORTRAN
ACM Transactions on Mathematical Software (TOMS)
Storage reorganization techniques for matrix computation in a paging environment
Communications of the ACM
Organizing matrices and matrix operations for paged memory systems
Communications of the ACM
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Structure of Computers and Computations
Structure of Computers and Computations
IBM Systems Journal
Tuning the rank-n update in a wavefront solver for peak performance
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A parallel algorithm for the quadratic assignment problem
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Cache considerations for multiprocessor programmers
Communications of the ACM
The impact of memory organization on the performance of matrix multiplication
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Multiplication of a symmetric banded matrix by a vector on a vector multiprocessor computer
IBM Journal of Research and Development
A proposal of Level 3 interface for band and skyline matrix factorization subroutine
ICS '93 Proceedings of the 7th international conference on Supercomputing
Influence of the stride on the cache utilization in the IBM 3090 VF
ICS '89 Proceedings of the 3rd international conference on Supercomputing
The impact of memory organization on the performance of matrix calculations
Parallel Computing
Sparse matrix vector multiplication techniques on the IBM 3090 VF
Parallel Computing
Hi-index | 4.12 |
Programming techniques necessary for high performance on the 3090 Vector Facilities are illustrated, showing that VS Fortran programs can achieve near-maximum execution rates. Relevant features of the 3090 architecture are reviewed, stressing the need to make efficient use of a hierarchical storage system and take advantage of the compound vector instructions. The key programming techniques for managing the storage hierarchy are loop sectioning, loop distribution, and data compaction. Vector register, cache reuse, and virtual memory, storage format, and page reuse are shown to lead to efficient use of the vector registers, the high speed cache, and the virtual memory system, respectively. The multiply-and-add compound instruction is discussed.