Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

  • Authors:
  • Shekhar Srikantaiah;Mahmut Kandemir

  • Affiliations:
  • -;-

  • Venue:
  • MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST), (ii) it supports translation migration for maximizing the utilization of TLB capacity, and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3% and 31.2% of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1% and performance of multithreaded applications by 27.3%, on average.