International Journal of High Performance Computing Applications
Adjacency-based data reordering algorithm for acceleration of finite element computations
Scientific Programming
Analyzing the execution of sparse matrix-vector product on the Finisterrae SMP-NUMA system
The Journal of Supercomputing
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Sparse matrix-vector multiplication on the Single-Chip Cloud Computer many-core processor
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In order to efficiently exploit available parallelism, multicore processors must address contention for shared resources as cache hierarchy. This fact becomes even more important when irregular codes are executed on them, which is the case for sparse matrix ones. In this paper a technique for increasing locality of sparse matrix codes on multicore platforms is presented. The technique consists on reorganizing the data guided by a locality model which introduces the concept of windows of locality. The evaluation of the reordering technique has been performed on two different leading multicore platforms: Intel Core2Duo and Intel Xeon. Experimental results show important performance improvements when using our reordered matrices with respect to original ones. In particular, an average execution time reduction of about 30% is achieved considering different number of running threads. These results are due to an improved overall cache behavior. Likewise, a comparison of our proposal with some standard reordering techniques is included in the paper. Results point out that the reordering technique always outperforms standard algorithms and is effective for matrices with any structure.