Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficient out-of-core algorithms for linear relaxation using blocking covers
Journal of Computer and System Sciences - Special issue: papers from the 32nd and 34th annual symposia on foundations of computer science, Oct. 2–4, 1991 and Nov. 3–5, 1993
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
A multigrid tutorial: second edition
A multigrid tutorial: second edition
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Quantifying the Multi-level Nature of Tiling Interactions
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Improving Cache Utilization of Linear Relaxation Methods: Theory and Practice
ISCOPE '99 Proceedings of the Third International Symposium on Computing in Object-Oriented Parallel Environments
Wavefront cache-friendly algorithm for compact numerical schemes
Wavefront cache-friendly algorithm for compact numerical schemes
Impact of modern memory subsystems on cache optimizations for stencil computations
Proceedings of the 2005 workshop on Memory system performance
Implicit and explicit optimizations for stencil computations
Proceedings of the 2006 workshop on Memory system performance and correctness
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Generation and optimisation of code using Coxeter lattice paths
Proceedings of the 2007 international workshop on Parallel symbolic computation
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Reconsidering algorithms for iterative solvers in the multicore era
International Journal of Computational Science and Engineering
Autotuning multigrid with PetaBricks
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Hardware/software co-design for energy-efficient seismic modeling
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimization of geometric multigrid for emerging multi- and manycore processors
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Multigrid is widely used as an efficient solver for sparse linear systems arising from the discretization of elliptic boundary value problems. Linear relaxation methods such as Gauss-Seidel and Red-Black Gauss-Seidel form the principal computational component of multigrid, and thus affect its efficiency. In the context of multigrid, these iterative solvers are executed for a small number of iterations (2-8). We exploit this property of the algorithm to develop a cache-efficient multigrid method, by focusing on improving the memory behavior of the linear relaxation methods. The efficiency in our cache-efficient linear relaxation algorithm comes from two sources: reducing the number of data cache and TLB misses, and reducing the number of memory references by keeping values register-resident. Our optimizations are applicable to multigrid applied to linear systems arising from constant coefficient elliptic PDEs on structured grids. Experiments on five modern computing platforms show a performance improvement of 1.15-2.7 times over a standard implementation of Full Multigrid V-Cycle.