A parallel Gauss-Seidel method using NR data flow ordering
Applied Mathematics and Computation
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Tutorial on Elliptic PDE Solvers and Their Parallelization
Tutorial on Elliptic PDE Solvers and Their Parallelization
Cache-Efficient Multigrid Algorithms
International Journal of High Performance Computing Applications
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Hi-index | 0.00 |
Efficient solution of computational problems require a match between the algorithm and the underlying architecture. New multicore processors feature low intra-chip communication cost and smaller per-thread caches compared to single-core implementations, indicating that data locality issues are more important than communication overheads. We investigate the impact of these changes on parallel multigrid methods. We present a temporally blocked, naturally ordered, smoother implementation that improves the data locality as much as ten times compared with the standard red-black algorithm. We present results of the performance of our new algorithm on an SMP system, an UltraSPARC T1 (Niagara) SMT/CMP, and a simulated CMP processor.