Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Feedback-directed page placement for ccNUMA via hardware-generated memory traces
Journal of Parallel and Distributed Computing
Dual-layered file cache on cc-NUMA system
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A dynamic optimization framework for OpenMP
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Node-based memory management for scalable NUMA architectures
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Hi-index | 0.00 |
We present the design and implementation of UPMLIB, a runtime system that provides transparent facilities for dynamically tuning the memory performance of OpenMP programs on scalable shared-memory multiprocessors with hardware cache-coherence. UPMLIB integrates information from the compiler and the operating system, to implement algorithms that perform accurate and timely page migrations. The algorithms and the associated mechanisms correlate memory reference information with the semantics of parallel programs and scheduling events that break the association between threads and data for which threads have memory affinity at runtime. Our experimental evidence shows that UPMLIB makes OpenMP programs immune to the page placement strategy of the operating system, thus obviating the need for introducing data placement directives in OpenMP. Furthermore, UPMlib provides solid improvements of throughput in multiprogrammed execution environments.