The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Is data distribution necessary in OpenMP?
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system
Proceedings of the 19th annual international conference on Supercomputing
Hardware profile-guided automatic page placement for ccNUMA systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Modern Operating Systems
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Enabling high-performance memory migration for multithreaded applications on LINUX
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Affinity-on-next-touch: an extension to the Linux kernel for NUMA architectures
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
The 48-core SCC Processor: the Programmer's View
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Revisiting shared virtual memory systems for non-coherent memory-coupled cores
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.00 |
Large state-of-the-art NUMA systems may offer more than two levels of node distances. The result is a hierarchical architecture with significant differences in memory access bandwidth and latency. Consequently, NUMA-aware memory management and the reduction of remote memory accesses becomes more and more the key challenge for the operating system and its applications. In this paper, we will show that traditional, centralized concepts to realize paging are not longer an adequate approach for these architectures. We present a prototype of new node-based memory management for the Linux kernel and prove its scalability and usability. The hardware architecture is reflected by managing one page mapping table per NUMA node and the kernel's page fault handler is extended to create node-local references. Based on this prototype, we suggest extensions to simplify the detection of performance issues, which will increase the usability of such architectures.