Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications
ICS '99 Proceedings of the 13th international conference on Supercomputing
A Supernodal Approach to Sparse Partial Pivoting
SIAM Journal on Matrix Analysis and Applications
Efficient compiler and run-time support for parallel irregular reductions
Parallel Computing - special issue on parallel computing for irregular applications
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Guiding program transformations with modal performance models
Guiding program transformations with modal performance models
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatically enhancing locality for tree traversals with traversal splicing
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Exploiting domain knowledge to optimize parallel computational mechanics codes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
In modern computer architecture the use of memory hierarchies causes a program's data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have nonaffine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup.