Improving the memory-system performance of sparse-matrix vector multiplication

Authors:
S. Toledo
Affiliations:
-
Venue:
IBM Journal of Research and Development
Year:
1997

Citing 18
Cited 43

The effect of ordering on preconditioned conjugate gradients

BIT
An effective on-chip preloading scheme to reduce data access penalty

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch

IBM Journal of Research and Development
POWER2: next generation of the RISC System/6000 family

IBM Journal of Research and Development
The POWER2 performance monitor

IBM Journal of Research and Development
High-performance parallel implementations of the NAS kernel benchmarks on the IBM SP2

IBM Systems Journal
Tolerating latency through software-controlled data prefetching

Tolerating latency through software-controlled data prefetching
Fast and effective algorithms for graph partitioning and sparse-matrix ordering

IBM Journal of Research and Development - Special issue: optical lithography I
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Superscalar Instruction Execution in the 21164 Alpha Microprocessor

IEEE Micro
UltraSparc I: A Four-Issue Processor Supporting Multimedia

IEEE Micro
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors

The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors

The effectiveness of caches and data prefetch buffers in large-scale shared memory multiprocessors
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

An Improved Computation of the PageRank Algorithm

Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
Adaptive on-line page importance computation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Self-adapting software for numerical linear algebra and LAPACK for clusters

Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Sparse Matrix-Vector multiplication on FPGAs

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

International Journal of High Performance Computing Applications
Calibrating quantum chemistry: A multi-teraflop, parallel-vector, full-configuration interaction program for the Cray-X1

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Performance Evaluation of a Parallel Iterative Method Library using OpenMP

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Fast transpose methods for kernel learning on sparse data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
An operation stacking framework for large ensemble computations

Proceedings of the 21st annual international conference on Supercomputing
Optimal polarity for dual Reed-Muller expressions

EHAC'07 Proceedings of the 6th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications
Optimal polarity for dual Reed-Muller expressions

EHAC'07 Proceedings of the 6th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Optimal polarity for dual Reed-Muller expressions

MINO'08 Proceedings of the 7th WSEAS International Conference on Microelectronics, Nanoelectronics, Optoelectronics
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Evaluation of Hierarchical Mesh Reorderings

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
Parallel blocked sparse matrix-vector multiplication with dynamic parameter selection method

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Self-adapting software for numerical linear algebra library routines on clusters

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Operation Stacking for Ensemble Computations With Variable Convergence

International Journal of High Performance Computing Applications
Increasing the Locality of Iterative Methods and Its Application to the Simulation of Semiconductor Devices

International Journal of High Performance Computing Applications
Improving the performance of tensor matrix vector multiplication in cumulative reaction probability based quantum chemistry codes

HiPC'08 Proceedings of the 15th international conference on High performance computing
Cache friendly sparse matrix-vector multiplication

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Exploiting compression opportunities to improve SpMxV performance on shared memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance evaluation of parallel sparse matrix-vector products on SGI Altix3700

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
On improving performance and energy profiles of sparse scientific applications

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Conjugate gradient sparse solvers: performance-power characteristics

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cache friendly sparse matrix-vector multiplication

ACM Communications in Computer Algebra
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Exploiting dense substructures for fast sparse matrix vector multiplication

International Journal of High Performance Computing Applications
"Wide or tall" and "sparse matrix dense matrix" multiplications

Proceedings of the 19th High Performance Computing Symposia
Two-dimensional cache-oblivious sparse matrix-vector multiplication

Parallel Computing
An object-oriented bulk synchronous parallel library for multicore programming

Concurrency and Computation: Practice & Experience
Sparse matrix-vector multiply on the HICAMP architecture

Proceedings of the 26th ACM international conference on Supercomputing
Automatic tuning of the sparse matrix vector product on GPUs based on the ELLR-T approach

Parallel Computing
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Fast Recommendation on Bibliographic Networks

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Scaling LAPACK panel operations using parallel cache assignment

ACM Transactions on Mathematical Software (TOMS)

Quantified Score

Hi-index	0.00

Improving the memory-system performance of sparse-matrix vector multiplication

Quantified Score

Visualization

Abstract