Exact sparse matrix-vector multiplication on GPU's and multicore architectures
Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Parallel implementation of conjugate gradient method on graphics processors
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Hi-index | 0.00 |
Sparse matrix-vector multiplication (shortly SpM脳V ) is one of most common subroutines in the numerical linear algebra. The problem is that the memory access patterns during the SpM脳V are irregular and the utilization of cache can suffer from low spatial or temporal locality. This paper introduces new approach for the acceleration the SpM脳V . This approach consists of 3 steps. The first step divides the whole matrix into smaller parts (regions) those can fit in the cache. The second step improves locality during the multiplication due to better utilization of distant references. The last step maximizes machine computation performance of the partial multiplication for each region. In this paper, we describe aspects of these 3 steps in more detail (including fast and time-inexpensive algorithms for all steps). Our measurements proved that our approach gives a significant speedup for almost all matrices arising from various technical areas.