Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

  • Authors:
  • Daniel J. Scales;Kourosh Gharachorloo;Chandramohan A. Thekkath

  • Affiliations:
  • Western Research Laboratory, Digital Equipment Corporation;Western Research Laboratory, Digital Equipment Corporation;Systems Research Center, Digital Equipment Corporation

  • Venue:
  • Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient cache coherence protocol that incorporates a number of optimizations, including support for multiple communication granularities and use of relaxed memory models. This system is fully functional and runs on a cluster of Alpha workstations.The primary focus of this paper is to describe the techniques used in Shasta to reduce the checking overhead for supporting fine granularity sharing in software. These techniques include careful layout of the shared address space, scheduling the checking code for efficient execution on modern processors, using a simple method that checks loads using only the value loaded, reducing the extra cache misses caused by the checking code, and combining the checks for multiple loads and stores. To characterize the effect of these techniques, we present detailed performance results for the SPLASH-2 applications running on an Alpha processor. Without our optimizations, the checking overheads are excessively high, exceeding 100% for several applications. However, our techniques are effective in reducing these overheads to a range of 5% to 35% for almost all of the applications. We also describe our coherence protocol and present some preliminary results on the parallel performance of several applications running on our workstation cluster. Our experience so far indicates that once the cost of checking memory accesses is reduced using our techniques, the Shasta approach is an attractive software solution for supporting a shared address space with fine-grain access to data.