A template library to integrate thread scheduling and locality management for NUMA multiprocessors

  • Authors:
  • Zoltan Majo;Thomas R. Gross

  • Affiliations:
  • Department of Computer Science, ETH Zurich;Department of Computer Science, ETH Zurich

  • Venue:
  • HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many multicore multiprocessors have a non-uniform memory architecture (NUMA), and for good performance, data and computations must be partitioned so that (ideally) all threads execute on the processor that holds their data. However, many multithreaded applications show heavy use of shared data structures that are accessed by all threads of the application. Automatic data placement and thread scheduling for these applications is (still) difficult. We present a template library for shared data structures that allows a programmer to express both the data layout (how the data space is partitioned) as well as thread mapping and scheduling (when and where a thread is executed). The template library supports programmers in dividing computations and data for reducing the percentage of costly remote memory accesses in NUMA multicore multiprocessors. Initial experience with ferret, a program with irregular memory access patterns from the PARSEC benchmark suite, shows that this approach can reduce the number of remote accesses from 42% to 10% and results in a performance improvement of 3% without overwhelming the programmer.