Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient out-of-core algorithms for linear relaxation using blocking covers
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Improving Cache Utilization of Linear Relaxation Methods: Theory and Practice
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Wavefront cache-friendly algorithm for compact numerical schemes
Wavefront cache-friendly algorithm for compact numerical schemes
Data Layout Optimizations for Variable Coefficient Multigrid
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Tight Bounds on Capacity Misses for 3D Stencil Codes
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods like Gauss-Seidel and Red-Black Gauss-Seidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2-8). We exploit this property of the algorithm to develop a cache-efficient multigrid, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cache-efficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values register-resident. Experiments on five modern computing platforms show a performance improvement of 1.15-2.7 times over a standard implementation of Full Multigrid V-Cycle.