Fine-grained mobility in the Emerald system
ACM Transactions on Computer Systems (TOCS)
Mirage: a coherent distributed shared memory design
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Tradeoffs between false sharing and aggregation in software distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
The relative importance of concurrent writers and weak consistency models
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Heaps and Stacks in Distributed Shared Memory
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The region trap library: handling traps on application-defined regions of memory
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
The architecture of the DecentVM: towards a decentralized virtual machine for many-core computing
Virtual Machines and Intermediate Languages
Hi-index | 0.00 |
This paper presents simulated results comparing representatives of two approaches to software DSM: an object-based protocol and a page-based protocol. We explore the performance implications of each approach, including the object approach's advantages in bandwidth consumption and lack of false sharing.Somewhat surprisingly, the locality and data aggregation advantages of page-based systems prove to be the dominant factors with typical operating system overheads. We show that large page sizes actually improve the performance of multiwriter protocols, primarily because validating a single object validates all other objects on the same page as well. Since our applications have significant spatial locality, these additional validations reduce the number of remote misses, without signcantly increasing bandwidth requirements. For three out of the four applications we tested, our page-based protocol matched or outperformed our object-based protocol under typical operating systems costs.We quantify this effect, and conclude with a discussion of techniques that could allow each approach to benefit from the best features ofthe other.