Combining performance aspects of irregular gauss-seidel via sparse tiling

Authors:
Michelle Mills Strout;Larry Carter;Jeanne Ferrante;Jonathan Freeman;Barbara Kreaseck
Affiliations:
University of California, San Diego, CA;University of California, San Diego, CA;University of California, San Diego, CA;University of California, San Diego, CA;University of California, San Diego, CA
Venue:
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Year:
2002

Citing 26
Cited 8

Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Domain decomposition: parallel multilevel methods for elliptic partial differential equations

Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
New tiling techniques to improve cache temporal locality

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Interface Compilation: Steps Toward Compiling Program Interfaces as Languages

IEEE Transactions on Software Engineering
Optimizing strategies for telescoping languages: procedure strength reduction and procedure vectorization

ICS '01 Proceedings of the 15th international conference on Supercomputing
MaJIC: compiling MATLAB for speed and responsiveness

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A distributed memory unstructured gauss-seidel algorithm for multigrid smoothers

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Increasing temporal locality with skewing and recursive blocking

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
BoomerAMG: a parallel algebraic multigrid solver and preconditioner

Applied Numerical Mathematics - Developments and trends in iterative methods for large systems of equations—in memoriam Rüdiger Weiss
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
Rescheduling for Locality in Sparse Matrix Computations

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cache-Efficient Multigrid Algorithms

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Iteration Space Slicing for Locality

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
A Comparison of Locality Transformations for Irregular Codes

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Optimizing the performance of sparse matrix-vector multiplication

Optimizing the performance of sparse matrix-vector multiplication

Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Applications of Algebraic Multigrid to Large-Scale Finite Element Analysis of Whole Bone Micro-Mechanics on the IBM SP

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Sparse Tiling for Stationary Iterative Methods

International Journal of High Performance Computing Applications
Mumford and Shah Functional: VLSI Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Minimizing data size for efficient data reuse in grid-enabled medical applications

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Code generation for parallel execution of a class of irregular loops on distributed memory systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Split tiling for GPUs: automatic parallelization using trapezoidal tiles

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
The Zoltan and Isorropia parallel toolkits for combinatorial scientific computing: Partitioning, ordering and coloring

Scientific Programming - A New Overview of the Trilinos Project --Part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finite Element problems are often solved using multigrid techniques. The most time consuming part of multigrid is the iterative smoother, such as Gauss-Seidel. To improve performance, iterative smoothers can exploit parallelism, intra-iteration data reuse, and inter-iteration data reuse. Current methods for parallelizing Gauss-Seidel on irregular grids, such as multi-coloring and owner-computes based techniques, exploit parallelism and possibly intra-iteration data reuse but not inter-iteration data reuse. Sparse tiling techniques were developed to improve intra-iteration and inter-iteration data locality in iterative smoothers. This paper describes how sparse tiling can additionally provide parallelism. Our results show the effectiveness of Gauss-Seidel parallelized with sparse tiling techniques on shared memory machines, specifically compared to owner-computes based Gauss-Seidel methods. The latter employ only parallelism and intra-iteration locality. Our results support the premise that better performance occurs when all three performance aspects (parallelism, intra-iteration, and inter-iteration data locality) are combined.