A high performance algorithm using pre-processing for the sparse matrix-vector multiplication
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
General atomic and molecular electronic structure system
Journal of Computational Chemistry
Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering
Hi-index | 0.00 |
We describe an efficient parallel and vector algorithm for solving huge eigen-vector problems in quantum chemistry. An automatically adaptive, single-vector, iterative diagonalization method was also developed to reduce the memory requirement and avoid an I/O bottleneck. Our initial full-configuration interaction calculation solved for an eigenvector with 65 billion coefficients and was performed on 432 MSPs of the Oak Ridge National Laboratory Cray-X1. One matrixvector multiplication took about 4 minutes, with 25 iterations being required for a tightly converged result. The aggregate performance was 3.4TFLOP/s (62% of peak speed).