Performance optimization of irregular codes based on the combination of reordering and blocking techniques

Authors:
J. C. Pichel;D. B. Heras;J. C. Cabaleiro;F. F. Rivera
Affiliations:
Dept. Electrónica e Computación, Universidade de Santiago de Compostela, Santiago de Compostela, Spain;Dept. Electrónica e Computación, Universidade de Santiago de Compostela, Santiago de Compostela, Spain;Dept. Electrónica e Computación, Universidade de Santiago de Compostela, Santiago de Compostela, Spain;Dept. Electrónica e Computación, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Venue:
Parallel Computing
Year:
2005

Citing 15
Cited 2

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Characterizing the behavior of sparse algorithms on caches

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Block algorithms for sparse matrix computations on high performance workstations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Modeling data locality for the sparse matrix-vector product using distance measures

Parallel Computing - Linear systems and associated problems
Modeling and improving locality for the sparse-matrix-vector product on cache memories

Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
Analytical Description of Locality for the Product of a Sparse Matrix by a Dense Matrix

PDPTA '02 Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications - Volume 1
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications

Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
Performance evaluation of the sparse matrix-vector multiplication on modern architectures

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The combination of techniques based on reordering data with classic code restructuring techniques for increasing the locality in the execution of sparse algebra codes is studied in this paper. The reordering techniques are based on, first modeling the locality in run-time, and then applying a heuristic for increasing it. After this, a code restructuring technique specially tuned for sparse algebra codes called register blocking is applied. The product of a sparse matrix by a dense vector (SpM × V) is the code studied on different monoprocessors and distributed memory multiprocessors. The combination of both techniques was tested for a broad set of matrices from real problems and known repositories. The results expressed in terms of execution time show that an adequate reordering of the data improves the efficiency of applying register blocking, therefore, reducing the execution time for the sparse algebra code considered.