Node-based memory management for scalable NUMA architectures

Authors:
Stefan Lankes;Thomas Bemmerl;Thomas Roehl;Christian Terboven
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
Year:
2012

Citing 17
Cited 0

The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Is data distribution necessary in OpenMP?

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Extending OpenMP for NUMA machines

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system

Proceedings of the 19th annual international conference on Supercomputing
Hardware profile-guided automatic page placement for ccNUMA systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and thread affinity in openmp programs

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Modern Operating Systems

Modern Operating Systems
Dynamic Task and Data Placement over NUMA Architectures: An OpenMP Runtime Perspective

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Enabling high-performance memory migration for multithreaded applications on LINUX

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Affinity-on-next-touch: an extension to the Linux kernel for NUMA architectures

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
How to Scale Nested OpenMP Applications on the ScaleMP vSMP Architecture

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Revisiting shared virtual memory systems for non-coherent memory-coupled cores

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large state-of-the-art NUMA systems may offer more than two levels of node distances. The result is a hierarchical architecture with significant differences in memory access bandwidth and latency. Consequently, NUMA-aware memory management and the reduction of remote memory accesses becomes more and more the key challenge for the operating system and its applications. In this paper, we will show that traditional, centralized concepts to realize paging are not longer an adequate approach for these architectures. We present a prototype of new node-based memory management for the Linux kernel and prove its scalability and usability. The hardware architecture is reflected by managing one page mapping table per NUMA node and the kernel's page fault handler is extended to create node-local references. Based on this prototype, we suggest extensions to simplify the detection of performance issues, which will increase the usability of such architectures.