Sparse Tiling for Stationary Iterative Methods

Authors:
Michelle Mills Strout;Larry Carter;Jeanne Ferrante;Barbara Kreaseck
Affiliations:
Argonne National Laboratory;University of California, San Diego;University of California, San Diego;La Sierra University
Venue:
International Journal of High Performance Computing Applications
Year:
2004

Citing 33
Cited 10

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Principles of runtime support for parallel processors

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A multicolour SOR method for the finite-element method

Journal of Computational and Applied Mathematics
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Nonlinear array dependence analysis

Nonlinear array dependence analysis
A unifying framework for iteration reordering transformations

A unifying framework for iteration reordering transformations
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Architecture-cognizant divide and conquer algorithms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests

Proceedings of the 14th international conference on Supercomputing
An axiomatic basis for computer programming

Communications of the ACM
Efficient compiler and run-time support for parallel irregular reductions

Parallel Computing - special issue on parallel computing for irregular applications
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Increasing temporal locality with skewing and recursive blocking

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Rescheduling for Locality in Sparse Matrix Computations

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cache-Efficient Multigrid Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Finding Legal Reordering Transformations Using Mappings

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication
Performance transformations for irregular applications

Performance transformations for irregular applications
Combining performance aspects of irregular gauss-seidel via sparse tiling

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Parallel Computing
A modified version of Jacobi approach

International Journal of Innovative Computing and Applications
Increasing the Locality of Iterative Methods and Its Application to the Simulation of Semiconductor Devices

International Journal of High Performance Computing Applications
Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
Optimization of sparse matrix-vector multiplication using reordering techniques on GPUs

Microprocessors & Microsystems
Parallelizing SOR for GPGPUs using alternate loop tiling

Parallel Computing
An Efficient Parallel Implementation for Three-Dimensional Incompressible Pipe Flow Based on SIMPLE

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Concurrency and Computation: Practice & Experience
Applications of the streamed storage format for sparse matrix operations

International Journal of High Performance Computing Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In modern computers, a program's data locality can affect performance significantly. This paper details full sparse tiling, a run-time reordering transformation that improves the data locality for stationary iterative methods such as Gauss-Seidel operating on sparse matrices. In scientific applications such as finite element analysis, these iterative methods dominate the execution time. Full sparse tiling chooses a permutation of the rows and columns of the sparse matrix, and then an order of execution that achieves better data locality. We prove that full sparse-tiled Gauss-Seidel generates a solution that is bitwise identical to traditional Gauss-Seidel on the permuted matrix. We also present measurements of the performance improvements and the overheads of full sparse tiling and of cache blocking for irregular grids, a related technique developed by Douglas et al