Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

Authors:
Markus Kowarschik;Ulrich Rüde;Nils Thürey;Christian Weiß
Affiliations:
-;-;-;-
Venue:
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Year:
2002

Citing 12
Cited 3

A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The cache memory book

The cache memory book
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Memory characteristics of iterative methods

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Tiling optimizations for 3D scientific computations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Multigrid

Multigrid
High-performacne parallel implicit CFD

Parallel Computing - Special issue on parallel computing in aerospace
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Efficient Memory Programming

Efficient Memory Programming
Data Layout Optimizations for Variable Coefficient Multigrid

ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Optimizing Transformations of Stencil Operations for Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures

ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Compiler optimizations for avoiding cache conflict misses

Compiler optimizations for avoiding cache conflict misses

Off-loading application controlled data prefetching in numerical codes for multi-core processors

International Journal of Computational Science and Engineering
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Cache optimizations for iterative numerical codes aware of hardware prefetching

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today's computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.