Reference history, page size, and migration daemons in local/remote architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Simple but effective techniques for NUMA memory management
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Scheduling and page migration for multiprocessor compute servers
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Tapeworm: high-level abstractions of shared accesses
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
IEEE Transactions on Parallel and Distributed Systems
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
OpenMP on networks of workstations
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
The trade-off between implicit and explicit data distribution in shared-memory programming paradigms
ICS '01 Proceedings of the 15th international conference on Supercomputing
Exploiting memory affinity in OpenMP through schedule reuse
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Scaling irregular parallel codes with minimal programming effort
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
OpenMP versus MPI for PDE Solvers Based on Regular Sparse Numerical Operators
ICCS '02 Proceedings of the International Conference on Computational Science-Part III
Automation of Data Traffic Control on DSM Architectures
ICCS '01 Proceedings of the International Conference on Computational Science-Part II
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Effective Cross-Platform, Multilevel Parallelism via Dynamic Adaptive Execution
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
User-controllable coherence for high performance shared memory multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Algorithm engineering for parallel computation
Experimental algorithmics
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Scientific Programming - OpenMP
OpenMP issues arising in the development of parallel BLAS and LAPACK libraries
Scientific Programming - OpenMP
Scaling non-regular shared-memory codes by reusing custom loop schedules
Scientific Programming - OpenMP
Towards optimisation of openMP codes for synchronisation and data reuse
International Journal of High Performance Computing and Networking
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Large-scale phylogenetic analysis on current HPC architectures
Scientific Programming - Large-Scale Programming Tools and Environments
OpenMP versus MPI for PDE solvers based on regular sparse numerical operators
Future Generation Computer Systems
Language support for multi-paradigm and multi-grain parallelism on SMP-Cluster
International Journal of Computers and Applications
OpenMP and NUMA architectures I: Investigating memory placement on the SGI origin 3000
ICCS'03 Proceedings of the 2003 international conference on Computational science
Asynchronous execution of OpenMP code
ICCS'03 Proceedings of the 2003 international conference on Computational science
Analyses for the translation of OpenMP codes into SPMD style with array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Improving the performance of OpenMP by array privatization
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Towards NUMA support with distance information
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Node-based memory management for scalable NUMA architectures
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Hi-index | 0.00 |
This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes-such as round-robin or random distribution of pages-incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtimeenvironment can use page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results support the effectiveness of these mechanisms and provide a proof of concept that there is no need to introduce data distribution directives in OpenMP and warrant the portability of the programming model.