Shared memory computing on clusters with symmetric multiprocessors and system area networks

  • Authors:
  • Leonidas Kontothanassis;Robert Stets;Galen Hunt;Umit Rencuzogullari;Gautam Altekar;Sandhya Dwarkadas;Michael L. Scott

  • Affiliations:
  • HP Labs, Cambridge, MA;Google, Inc., Mountain View, CA;Microsoft Research, Redmond, WA;VMware, Inc., Palo Alto, CA;University of California, Berkeley, Berkeley, CA;University of Rochester, Rochester, NY;University of Rochester, Rochester, NY

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cashmere is a software distributed shared memory (S-DSM) system designed for clusters of server-class machines. It is distinguished from most other S-DSM projects by (1) the effective use of fast user-level messaging, as provided by modern system-area networks, and (2) a “two-level” protocol structure that exploits hardware coherence within multiprocessor nodes. Fast user-level messages change the tradeoffs in coherence protocol design; they allow Cashmere to employ a relatively simple directory-based coherence protocol. Exploiting hardware coherence within SMP nodes improves overall performance when care is taken to avoid interference with inter-node software coherence.We have implemented Cashmere on a Compaq AlphaServer/Memory Channel cluster, an architecture that provides fast user-level messages. Experiments indicate that a one-level, version of the Cashmere protocol provides performance comparable to, or slightly better than, that of TreadMarks' lazy release consistency. Comparisons to Compaq's Shasta protocol also suggest that while fast user-level messages make finer-grain software DSMs competitive, VM-based systems continue to outperform software-based access control for applications without extensive fine-grain sharing.Within the family of Cashmere protocols, we find that leveraging intranode hardware coherence provides a 37% performance advantage over a more straightforward one-level implementation. Moreover, contrary to our original expectations, noncoherent hardware support for remote memory writes, total message ordering, and broadcast, provide comparatively little in the way of additional benefits over just fast messaging for our application suite.