Nonuniform memory affinity strategy in multithreaded sparse matrix computations

  • Authors:
  • Avinash Srinivasa;Masha Sosonkina

  • Affiliations:
  • Iowa State University Ames, IA;Iowa State University Ames, IA

  • Venue:
  • Proceedings of the 2012 Symposium on High Performance Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the core counts on modern multiprocessor systems increase, so does the memory contention with all the processes/threads trying to access the main memory simultaneously. This is typical of UMA (Uniform Memory Access) architectures with a single physical memory bank leading to poor scalability in multithreaded applications. To alleviate this problem, modern systems are moving increasingly towards Nonuniform Memory Access (NUMA) architectures, in which the physical memory is split into several (typically two or four) banks. Each memory bank is associated with a set of cores enabling threads to operate from their own physical memory banks while retaining the concept of a shared virtual address space. However, accessing shared data structures from the remote memory banks may become increasingly slow. This paper proposes a way to determine and pin certain parts of the shared data to specific memory banks, thus minimizing remote accesses. To achieve this, the existing application code may be supplied with the proposed interface to set up and distribute shared data appropriately among memory banks. Experiments with the NAS CG benchmark as well as with a realistic large-scale application calculating ab initio nuclear structure have been performed. Speedups of up to 3.5 times were observed with the proposed approach compared with the default memory placement policy.