Direct methods for sparse matrices
Direct methods for sparse matrices
Achieving high sustained performance in an unstructured mesh CFD application
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The Matrix Template Library: Generic Components for High-Performance Scientific Computing
Computing in Science and Engineering
SPAR: A New Architecture for Large Finite Element Computations
IEEE Transactions on Computers
A Hierarchical Sparse Matrix Storage Format for Vector Processors
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Leading Zero Anticipation and Detection A Comparison of Methods
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Compression of Sparse Matrices by Arithmetic Coding
DCC '98 Proceedings of the Conference on Data Compression
Using latent semantic indexing to filter spam
Proceedings of the 2003 ACM symposium on Applied computing
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
A streaming floating-point sparse-matrix compression which forms a key element of an accelerator for finite-element and other linear algebra applications is described. The proposed architecture seeks to accelerate the key performance-limiting Sparse Matrix-Vector Multiplication (SMVM) operation at the heart of finite-element applications through a combination of a dedicated datapath optimized for these applications with a streaming data-compression and decompression unit which increases the effective memory bandwidth seen by the datapath. The proposed format uses variable length entries which contain an opcode and optionally an address and/or non-zero entry. System simulations performed using a cycle-accurate C++ architectural model and a database of over 400 large symmetric and unsymmetric matrices containing up to 20M non-zero elements (and a total of 226M non-zeroes) demonstrate that a 20% average effective memory bandwidth performance improvement can be achieved using the proposed architecture compared with published work, for a modest increase in hardware resources.