Comparative study of page-based and segment-based software DSM through compiler optimization
Proceedings of the 14th international conference on Supercomputing
Improving fine-grained irregular shared-memory benchmarks by data reordering
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Removing the overhead from software-based shared memory
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Run-time support for distributed sharing in safe languages
ACM Transactions on Computer Systems (TOCS)
Transparent Adaptation of Sharing Granularity in MultiView-Based DSM Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Run-Time Support for Distributed Sharing in Typed Languages
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Evaluation of Compiler-Assisted Software DSM Schemes for a Workstation Cluster
IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Journal of Parallel and Distributed Computing
Proceedings of the 18th annual international conference on Supercomputing
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granularity, and the coarse-grain approach based on virtual memory hardware that provides coherence at a page granularity. Fine-grain systems offer a simple migration path for applications developed on hardware multiprocessors by supporting coherence protocols similar to those implemented in hardware. On the other hand, coarse-grain systems can potentially provide higher performance through more optimized protocols and larger transfer granularities, while avoiding instrumentation overheads. Numerous studies have examined each approach individually, but major differences in experimental platforms and applications make comparison of the approaches difficult.This paper presents a detailed comparison of two mature systems, Shasta and Cashmere, representing the fine- and coarse-grain approaches, respectively. Both systems are tuned to run on the same commercially available, state-of-the-art cluster of AlphaServer SMPs connected via a Memory Channel network. As expected, our results show that Shasta provides robust performance for applications tuned for hardware multiprocessors, and can better tolerate fine-grain synchronization. In contrast, Cashmere is highly sensitive to fine-grain synchronization, but provides a performance edge for applications with coarse-grain behavior. Interestingly, we found that the performance gap between the systems can often be bridged by program modifications that address coherence and synchronization granularity. In addition, our study reveals some unexpected results related to the interaction of current compiler technology with application instrumentation, and the ability of SMP-aware protocols to avoid certain performance disadvantages of coarse-grain approaches.