Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

  • Authors:
  • Zoltan Majo;Thomas R. Gross

  • Affiliations:
  • ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland

  • Venue:
  • Proceedings of the international symposium on Memory management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multiprocessors based on processors with multiple cores usually include a non-uniform memory architecture (NUMA); even current 2-processor systems with 8 cores exhibit non-uniform memory access times. As the cores of a processor share a common cache, the issues of memory management and process mapping must be revisited. We find that optimizing only for data locality can counteract the benefits of cache contention avoidance and vice versa. Therefore, system software must take both data locality and cache contention into account to achieve good performance, and memory management cannot be decoupled from process scheduling. We present a detailed analysis of a commercially available NUMA-multicore architecture, the Intel Nehalem. We describe two scheduling algorithms: maximum-local, which optimizes for maximum data locality, and its extension, N-MASS, which reduces data locality to avoid the performance degradation caused by cache contention. N-MASS is fine-tuned to support memory management on NUMA-multicores and improves performance up to 32%, and 7% on average, over the default setup in current Linux implementations.