The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Parallel dynamic graph partitioning for adaptive unstructured meshes
Journal of Parallel and Distributed Computing - Special issue on dynamic load balancing
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Partitioning strategies for structured multiblock grids
Parallel Computing - Special issue on graph partioning and parallel computing
A unified algorithm for load-balancing adaptive scientific simulations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Highly parallel structured adaptive mesh refinement using parallel language-based approaches
Parallel Computing - new trends in high performance computing
Large scale parallel structured AMR calculations using the SAMRAI framework
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Dynamic page placement to improve locality in CC-NUMA multiprocessors for TPC-C
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
DRAMA: A Library for Parallel Dynamic Load Balancing of Finite Element Applications
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Using Hardware Counters to Automatically Improve Memory Performance
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
Proceedings of the 19th annual international conference on Supercomputing
Space---Time Adaptive Solution of First Order PDES
Journal of Scientific Computing
Load balancing and OpenMP implementation of nested parallelism
Parallel Computing - OpenMp
Racoon: a parallel mesh-adaptive framework for hyperbolic conservation laws
Parallel Computing
Extending OpenMP for NUMA machines
Scientific Programming
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Towards NUMA support with distance information
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Mapping applications for high performance on multithreaded, NUMA systems
Proceedings of the ACM International Conference on Computing Frontiers
Maximizing the performance of irregular applications on multithreaded, NUMA systems
IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Hi-index | 0.00 |
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement (AMR). The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing necessary. Due to the dynamically changing memory access pattern caused by the runtime adaption, it is a challenging task to achieve a high degree of geographical locality. The main conclusions of the study are: (1) that geographical locality is very important for the performance of the solver, (2) that the performance can be improved significantly using dynamic page migration of misplaced data, (3) that a migrate-on-next-touch directive works well whereas the first-touch strategy is less advantageous for programs exhibiting a dynamically changing memory access patterns, and (4) that the overhead for such migration is low compared to the total execution time.