The performance implications of locality information usage in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Design and analysis of static memory management policies for CC-NUMA Multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Evaluation of NUMA Memory Management Through Modeling and Measurements
IEEE Transactions on Parallel and Distributed Systems
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Efficient operating system scheduling for performance-asymmetric multi-core architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Data and thread affinity in openmp programs
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Accelerating configuration interaction calculations for nuclear structure
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Enabling high-performance memory migration for multithreaded applications on LINUX
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Memory Affinity for Hierarchical Shared Memory Multiprocessors
SBAC-PAD '09 Proceedings of the 2009 21st International Symposium on Computer Architecture and High Performance Computing
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A case for NUMA-aware contention management on multicore systems
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Dynamic Adaptations in ab-initio Nuclear Physics Calculations on Multicore Computer Architectures
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
As the core counts on modern multiprocessor systems increase, so does the memory contention with all the processes/threads trying to access the main memory simultaneously. This is typical of UMA (Uniform Memory Access) architectures with a single physical memory bank leading to poor scalability in multithreaded applications. To alleviate this problem, modern systems are moving increasingly towards Nonuniform Memory Access (NUMA) architectures, in which the physical memory is split into several (typically two or four) banks. Each memory bank is associated with a set of cores enabling threads to operate from their own physical memory banks while retaining the concept of a shared virtual address space. However, accessing shared data structures from the remote memory banks may become increasingly slow. This paper proposes a way to determine and pin certain parts of the shared data to specific memory banks, thus minimizing remote accesses. To achieve this, the existing application code may be supplied with the proposed interface to set up and distribute shared data appropriately among memory banks. Experiments with the NAS CG benchmark as well as with a realistic large-scale application calculating ab initio nuclear structure have been performed. Speedups of up to 3.5 times were observed with the proposed approach compared with the default memory placement policy.