Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism
25 years of the international symposia on Computer architecture (selected papers)
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Parallel multigrid for anisotropic elliptic equations
Journal of Parallel and Distributed Computing
Multigrid
Data Layout Optimizations for Variable Coefficient Multigrid
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Parallel multigrid smoothing: polynomial versus Gauss--Seidel
Journal of Computational Physics
On the performance of an algebraic multigrid solver on multicore clusters
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Hi-index | 0.00 |
We have addressed in this paper the implementation of red-black multigrid smoothers on high-end microprocessors. Most of the previous work about this topic has been focused on cache memory issues due to its tremendous impact on performance. In this paper, we have extended these studies taking Simultaneous Multithreading (SMT) into account. With the introduction of SMT, new possibilities arise, which makes a revision of the different alternatives highly advisable. A new strategy is proposed that focuses on inter-thread sharing to tolerate the increasing penalties caused by memory accesses. Performance results on an IBM's Power5 based system reveal that our alternative scheme can compete with and even improve sophisticated schemes based on tailored loop fusion and tiling transformations aimed at improving temporal locality.