Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the memory-system performance of sparse-matrix vector multiplication
IBM Journal of Research and Development
Computer Solution of Large Sparse Positive Definite
Computer Solution of Large Sparse Positive Definite
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
An operation stacking framework for large ensemble computations
Proceedings of the 21st annual international conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
Pattern-based sparse matrix representation for memory-efficient SMVM kernels
Proceedings of the 23rd international conference on Supercomputing
Performance evaluation of the sparse matrix-vector multiplication on modern architectures
The Journal of Supercomputing
Operation Stacking for Ensemble Computations With Variable Convergence
International Journal of High Performance Computing Applications
Sparse matrix-vector multiplication - final solution?
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Three: dimensional bursting simulation on two parallel systems
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
CSX: an extended compression format for spmv on shared memory systems
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Optimizing Sparse Data Structures for Matrix-vector Multiply
International Journal of High Performance Computing Applications
A new diagonal blocking format and model of cache behavior for sparse matrices
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Parallel simulation of three–dimensional bursting with MPI and OpenMP
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Large-scale scientific applications frequently compute sparse matrix-vector products in their computational core. For this reason, techniques for computing sparse matrix- vector products efficiently on modern architectures are important. In this paper we describe a strategy for improving the performance of sparse matrix-vector product computations using a loop transformation known as unrollandjam. We describe a novel sparse matrix representation that enables us to apply this transformation. Our approach is best suited for sparse matrices that have rows with a small number of predictable lengths. This work was motivated by sparse matrices that arise in SAGE, an application from Los Alamos National Laboratory. We evaluate the performance benefits of our approach using sparse matrices produced by SAGE for a pair of sample inputs. We show that our strategy is effective for improving sparse matrix-vector product performance using these matrices on MIPS R12000, Alpha Ev67, IBM Power 3, and Itanium 2 processors. Our measurements show that for this class of sparse matrices, our strategy improves sparse matrix-vector product performance from a low of 41% on MIPS to well over a factor of 2 on Itanium.