Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Principles of runtime support for parallel processors
ICS '88 Proceedings of the 2nd international conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A multicolour SOR method for the finite-element method
Journal of Computational and Applied Mathematics
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Nonlinear array dependence analysis
Nonlinear array dependence analysis
A unifying framework for iteration reordering transformations
A unifying framework for iteration reordering transformations
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Architecture-cognizant divide and conquer algorithms
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
An axiomatic basis for computer programming
Communications of the ACM
Efficient compiler and run-time support for parallel irregular reductions
Parallel Computing - special issue on parallel computing for irregular applications
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Rescheduling for Locality in Sparse Matrix Computations
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cache-Efficient Multigrid Algorithms
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Finding Legal Reordering Transformations Using Mappings
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Localizing Non-Affine Array References
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Optimizing the performance of sparse matrix-vector multiplication
Optimizing the performance of sparse matrix-vector multiplication
Performance transformations for irregular applications
Performance transformations for irregular applications
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
A modified version of Jacobi approach
International Journal of Innovative Computing and Applications
International Journal of High Performance Computing Applications
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs
Microprocessors & Microsystems
Parallelizing SOR for GPGPUs using alternate loop tiling
Parallel Computing
An Efficient Parallel Implementation for Three-Dimensional Incompressible Pipe Flow Based on SIMPLE
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Concurrency and Computation: Practice & Experience
Applications of the streamed storage format for sparse matrix operations
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
In modern computers, a program's data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss-Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparse-tiled Gauss-Seidel generates a solution that is bitwise identical to traditional Gauss-Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al