Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture

Authors:
Sverker Holmgren;Dan Wallin
Affiliations:
-;-
Venue:
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Year:
2001

Citing 7
Cited 1

Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Scalable Shared-Memory Multiprocessing

Scalable Shared-Memory Multiprocessing
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture

OpenMP versus MPI for PDE Solvers Based on Regular Sparse Numerical Operators

ICCS '02 Proceedings of the International Conference on Computational Science-Part III

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communication. Performance analysis of a non-trivial kernel representing a PDE solution algorithm has been carried out on a Sun WildFire computer. Here, different architecture, system and programming models can be studied. The WildFire system uses self-optimization techniques such as data migration and replication to change the placement of data at runtime. If the data placement is not optimal, the initial performance is degraded. However, after a few iterations the page migration daemon is able to modify the placement of data. The performance is improved, and equals what is achieved if the data is optimally placed at the start of the execution using hand tuning. The speedup for the PDE solution kernel is surprisingly good.