InvisiFence: performance-transparent memory ordering in conventional multiprocessors

  • Authors:
  • Colin Blundell;Milo M.K. Martin;Thomas F. Wenisch

  • Affiliations:
  • University of Pennsylvania, Philadelphia, PA, USA;University of Pennsylvania, Philadelphia, PA, USA;University of Michigan, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 36th annual international symposium on Computer architecture
  • Year:
  • 2009

Quantified Score

Hi-index 0.02

Visualization

Abstract

A multiprocessor's memory consistency model imposes ordering constraints among loads, stores, atomic operations, and memory fences. Even for consistency models that relax ordering among loads and stores, ordering constraints still induce significant performance penalties due to atomic operations and memory ordering fences. Several prior proposals reduce the performance penalty of strongly ordered models using post-retirement speculation, but these designs either (1) maintain speculative state at a per-store granularity, causing storage requirements to grow proportionally to speculation depth, or (2) employ distributed global commit arbitration using unconventional chunk-based invalidation mechanisms. In this paper we propose InvisiFence, an approach for implementing memory ordering based on post-retirement speculation that avoids these concerns. InvisiFence leverages minimalistic mechanisms for post-retirement speculation proposed in other contexts to (1) track speculative state efficiently at block-granularity with dedicated storage requirements independent of speculation depth, (2) provide fast commit by avoiding explicit commit arbitration, and (3) operate under a conventional invalidation-based cache coherence protocol. InvisiFence supports both modes of operation found in prior work: speculating only when necessary to minimize the risk of rollback-inducing violations or speculating continuously to decouple consistency enforcement from the processor core. Overall, InvisiFence requires approximately one kilobyte of additional state to transform a conventional multiprocessor into one that provides performance-transparent memory ordering, fences, and atomic operations.