Sparse matrix-vector multiply on the HICAMP architecture

Authors:
John P. Stevenson;Amin Firoozshahian;Alex Solomatnikov;Mark Horowitz;David Cheriton
Affiliations:
Stanford University, Palo Alto, CA, USA;HICAMP Systems, Menlo Park, CA, USA;HICAMP Systems, Menlo Park, CA, USA;Stanford University, Palo Alto, CA, USA;Stanford University & HICAMP Systems, Palo Alto, CA, USA
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 20
Cited 2

Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Xtensa: A Configurable and Extensible Processor

IEEE Micro
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Pattern-based sparse matrix representation for memory-efficient SMVM kernels

Proceedings of the 23rd international conference on Supercomputing
Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Implementing sparse matrix-vector multiplication on throughput-oriented processors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Parallel symmetric sparse matrix-vector product on scalar multi-core CPUs

Parallel Computing
Hierarchical Diagonal Blocking and Precision Reduction Applied to Combinatorial Multigrid

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
The future of microprocessors

Communications of the ACM
CSX: an extended compression format for spmv on shared memory systems

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A Sparse Matrix Personality for the Convey HC-1

FCCM '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
HICAMP: architectural support for efficient concurrency-safe shared structured data access

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Rethinking network stack design with memory snapshots

HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse matrix-vector multiply (SpMV) is a critical task in the inner loop of modern iterative linear system solvers and exhibits very little data reuse. This low reuse means that its performance is bounded by main-memory bandwidth. Moreover, the random patterns of indirection make it difficult to achieve this bound. We present sparse matrix storage formats based on deduplicated memory. These formats reduce memory traffic during SpMV and thus show significantly improved performance bounds: 90x better in the best case. Additionally, we introduce a matrix format that inherently exploits any amount of matrix symmetry and is at the same time fully compatible with non-symmetric matrix code. Because of this, our method can concurrently operate on a symmetric matrix without complicated work partitioning schemes and without any thread synchronization or locking. This approach takes advantage of growing processor caches, but incurs an instruction count overhead. It is feasible to overcome this issue by using specialized hardware as shown by the recently proposed Hierarchical Immutable Content-Addressable Memory Processor, or HICAMP architecture.