BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

  • Authors:
  • Lei Liu;Zehan Cui;Yong Li;Yungang Bao;Mingyu Chen;Chengyong Wu

  • Affiliations:
  • State Key Laboratory of Computer Architecture, ICT, CAS, China;State Key Laboratory of Computer Architecture, ICT, CAS, China/ Graduate School, CAS, China;Department of ECE, University of Pittsburgh;State Key Laboratory of Computer Architecture, ICT, CAS, China;State Key Laboratory of Computer Architecture, ICT, CAS, China;State Key Laboratory of Computer Architecture, ICT, CAS, China

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

The main memory system is a shared resource in modern multicore machines that can result in serious interference leading to reduced throughput and unfairness. Many new memory scheduling mechanisms have been proposed to address the interference problem. However, these mechanisms usually employ relative complex scheduling logic and need modifications to Memory Controllers (MCs), which incur expensive hardware design and manufacturing overheads. This article presents a practical software approach to effectively eliminate the interference without any hardware modifications. The key idea is to modify the OS memory management system and adopt a page-coloring-based Bank-level Partitioning Mechanism (BPM) that allocates dedicated DRAM banks to each core (or thread). By using BPM, memory requests from distinct programs are segregated across multiple memory banks to promote locality/fairness and reduce interference. We further extend BPM to BPM+ by incorporating channel-level partitioning, on which we demonstrate additional gain over BPM in many cases. To achieve benefits in the presence of diverse application memory needs and avoid performance degradation due to resource underutilization, we propose a dynamic mechanism upon BPM/BPM+ that assigns appropriate bank/channel resources based on application memory/bandwidth demands monitored through PMU (performance-monitoring unit) and a low-overhead OS page table scanning process. We implement BPM/BPM+ in Linux 2.6.32.15 kernel and evaluate the technique on four-core and eight-core real machines by running a large amount of randomly generated multiprogrammed and multithreaded workloads. Experimental results show that BPM/BPM+ can improve the overall system throughput by 4.7%/5.9%, on average, (up to 8.6%/9.5%) and reduce the unfairness by an average of 4.2%/6.1% (up to 15.8%/13.9%).