Critical path-based thread placement for NUMA systems

  • Authors:
  • ChunYi Su;Dong Li;Dimitrios S. Nikolopoulos;Matthew Grove;Kirk Cameron;Bronis R. de Supinski

  • Affiliations:
  • Virginia Tech, Blacksburg, VA, USA;Oak Ridge National Lab, Oak Ridge, TN, USA;FORTH-ICS, Heraklion, Crete, Greece;Virginia Tech, Blacksburg, VA, USA;Virginia Tech, Blacksburg, VA, USA;LLNL, Livermore, CA, USA

  • Venue:
  • ACM SIGMETRICS Performance Evaluation Review
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore multiprocessors use a Non Uniform Memory Architecture (NUMA) to improve their scalability. However, NUMA introduces performance penalties due to remote memory accesses. Without efficiently managing data layout and thread mapping to cores, scientific applications may suffer performance loss, even if they are optimized for NUMA. In this paper, we present algorithms and a runtime system that optimize the execution of OpenMP applications on NUMA architectures. By collecting information from hardware counters, the runtime system directs thread placement and reduces performance penalties by minimizing the critical path of OpenMP parallel regions. The runtime system uses a scalable algorithm that derives placement decisions with negligible overhead. We evaluate our algorithms and the runtime system with four NPB applications implemented in OpenMP. On average the algorithms achieve between 8.13% and 25.68% performance improvement, compared to the default Linux thread placement scheme. The algorithms miss the optimal thread placement in only 8.9% of the cases.