A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The cache memory book
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Multigrid
High-performacne parallel implicit CFD
Parallel Computing - Special issue on parallel computing in aerospace
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Efficient Memory Programming
Data Layout Optimizations for Variable Coefficient Multigrid
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Compiler optimizations for avoiding cache conflict misses
Compiler optimizations for avoiding cache conflict misses
Off-loading application controlled data prefetching in numerical codes for multi-core processors
International Journal of Computational Science and Engineering
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Cache optimizations for iterative numerical codes aware of hardware prefetching
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Hi-index | 0.00 |
Today's computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.