Efficient shared memory with minimal hardware support

  • Authors:
  • Leonidas I. Kontothanassis;Michael L. Scott

  • Affiliations:
  • Department of Computer Science, University of Rochester, Rochester, NY;Department of Computer Science, University of Rochester, Rochester, NY

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Shared memory is widely regarded as a more intuitive model than message passing for the development of parallel programs. A shared memory model can be provided by hardware, software, or some combination of both. One of the most important problems to be solved in shared memory environments is that of cache coherence. Experience indicates, unsurprisingly, that hardware-coherent multiprocessors greatly outperform distributed shared-memory (DSM) emulations on message-passing hardware. Intermediate options, however, have received considerably less attention. We argue in this position paper that one such option---a multiprocessor or network that provides a global physical address space in which processors can make non-coherent accesses to remote memory without trapping into the kernel or interrupting remote processors---can provide most of the performance of hardware cache coherence at little more monetary or design cost than traditional DSM systems. To support this claim we have developed the Cashmere family of software coherence protocols for NCC-NUMA (Non-Cache-Coherent, Non-Uniform-Memory Access) systems, and have used execution-driven simulation to compare the performance of these protocols to that of full hardware coherence and distributed shared memory emulation. We have found that for a large class of applications the performance of NCC-NUMA multiprocessors rivals that of fully hardware-coherent designs, and significantly surpasses the performance realized on more traditional DSM systems.