Traffic management: a holistic approach to memory placement on NUMA systems

  • Authors:
  • Mohammad Dashti;Alexandra Fedorova;Justin Funston;Fabien Gaud;Renaud Lachaize;Baptiste Lepers;Vivien Quema;Mark Roth

  • Affiliations:
  • Simon Fraser University, Burnaby, Canada;Simon Fraser University, Burnaby, Canada;Simon Fraser University, Burnaby, Canada;Simon Fraser University, Burnaby, Canada;UJF, Grenoble, France;CNRS, Grenoble, France;Grenoble INP, Grenoble, France;Simon Fraser University, Burnaby, Canada

  • Venue:
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

NUMA systems are characterized by Non-Uniform Memory Access times, where accessing data in a remote node takes longer than a local access. NUMA hardware has been built since the late 80's, and the operating systems designed for it were optimized for access locality. They co-located memory pages with the threads that accessed them, so as to avoid the cost of remote accesses. Contrary to older systems, modern NUMA hardware has much smaller remote wire delays, and so remote access costs per se are not the main concern for performance, as we discovered in this work. Instead, congestion on memory controllers and interconnects, caused by memory traffic from data-intensive applications, hurts performance a lot more. Because of that, memory placement algorithms must be redesigned to target traffic congestion. This requires an arsenal of techniques that go beyond optimizing locality. In this paper we describe Carrefour, an algorithm that addresses this goal. We implemented Carrefour in Linux and obtained performance improvements of up to 3.6 relative to the default kernel, as well as significant improvements compared to NUMA-aware patchsets available for Linux. Carrefour never hurts performance by more than 4% when memory placement cannot be improved. We present the design of Carrefour, the challenges of implementing it on modern hardware, and draw insights about hardware support that would help optimize system software on future NUMA systems.