ACM Transactions on Mathematical Software (TOMS)
Krylov subspace methods on supercomputers
SIAM Journal on Scientific and Statistical Computing
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Reverse Communication Interface for Linear Algebra Templates for Iterative Methods
Reverse Communication Interface for Linear Algebra Templates for Iterative Methods
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
LAPACK in SILC: Use of a Flexible Application Framework for Matrix Computation Libraries
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Test of Iterative Solvers on ITBL
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Performance evaluation of the sparse matrix-vector multiplication on modern architectures
The Journal of Supercomputing
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system
The Journal of Supercomputing
SILC: a flexible and environment-independent interface for matrix computation libraries
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Experience in developing an open source scalable software infrastructure in japan
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
Hi-index | 0.00 |
The present paper discusses scalable implementations of sparse matrix-vector products, which are crucial for high performance solutions of large-scale linear equations, on a cc-NUMA machine SGI Altix3700. Three storage formats for sparse matrices are evaluated, and scalability is attained by implementations considering the page allocation mechanism of the NUMA machine. Influences of the cache/memory bus architectures on the optimum choice of the storage format are examined, and scalable converters between storage formats shown to facilitate exploitation of storage formats of higher performance.