Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture

  • Authors:
  • Licheng Chen;Yongbing Huang;Yungang Bao;Onur Mutlu;Guangming Tan;Mingyu Chen

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Carnegie Mellon University, Pittsburgh, PA, USA;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • Proceedings of the international conference on Supercomputing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In modern multi-core chip architecture, the DRAM system is shared by more and more cores and high bandwidth I/O devices. This trend would make the problem of request contention and un-fairness more serious. Previous research focused on memory sche-duling mechanisms to efficiently and fairly serve memory requests generated by multiple cores. However, the performance is mod-erately improved due to the limited bank-level parallelism in preva-lent DRAM chips. Based on the observation that virtual channel memory (VCM) provides more opportunities for exploiting MLP because it has more channel buffers than banks in conventional DRAM chip, we evaluate VCM technology as an alternative to DRAM for addressing the issues of contention, unfairness and MLP. In this work we implement VCM and leverage the state of art scheduling mechanism on a multi-core architecture. The experi-mental results show that (i) VCM with 32 channels improves ho-mogeneous workloads' IPC by 2.08X on a 16-core system compared to the system with conventional DRAM chips, causing extra area cost by 0.5%, and dynamic and background power pe-nalties by only 5.8% and 0.03% respectively. (ii) For heterogene-ous workloads, VCM significantly reduces unfairness by 82.0% as well as improves the workloads' performance by 1.86X in term of system throughput.