Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Quantifying the performance differences between PVM and TreadMarks
Journal of Parallel and Distributed Computing
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
Symmetry and performance in consistency protocols
ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support
ICS '99 Proceedings of the 13th international conference on Supercomputing
Reducing System Overheads in Home-based Software DSMs
IPPS '99/SPDP '99 Proceedings of the 13th International Symposium on Parallel Processing and the 10th Symposium on Parallel and Distributed Processing
Software DSM Protocols that Adapt between Single Writer and Multiple Writer
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The relative importance of concurrent writers and weak consistency models
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Home-based shared virtual memory
Home-based shared virtual memory
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
A Comparison of Two Strategies of Dynamic Data Prefetching in Software DSM
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Design and Implementation of an Effective Prefetch Strategy for DSM Systems
The Journal of Supercomputing
A Dynamic Lock Protocol for Scope-Consistency sDSM Systems
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Towards implementation of a novel scheme for data prefetching on distributed shared memory systems
The Journal of Supercomputing
Design and implementation of an agent home scheme strategy for prefetch-based DSM systems
International Journal of Parallel Programming
Load balancing design issues on prefetch-based DSM systems
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Reducing communication overhead and page faults in SDSM platforms
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
On design and implementation of adaptive data classification scheme for DSM systems
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Hi-index | 0.00 |
Software DSMs can be categorized into homeless and home-based systems both have strengths and weaknesses when compared to each other. This paper introduces optimization methods to exploit advantages and offset disadvantages of the home-based protocol in the home-based software DSM JIAJIA. The first optimization reduces the overhead of writes to home pages through a lazy home page write detection scheme. The normal write detection scheme write-protects shared pages at the beginning of a synchronization interval, while the lazy home page write detection delays home page write-protecting until the page is first fetched in the interval so that home pages that are not cached by remote processors do not need to be write-protected. The second optimization avoids fetching the whole page on a page fault through dividing a page into blocks and fetching only those blocks that are dirty with respect to the faulting processor. A write vector table is maintained for each shared page in its home to record for each processor which block(s) has been modified since the processor fetched the page last time. The third optimization adaptively migrates home of a page to the processor most frequently writes to the page to reduce twin and diff overhead. Migration information is piggybacked on barrier messages and no additional communication is required for the migration. Performance evaluation with some well-accepted benchmarks and real applications shows that the above optimization methods can reduce page faults, message amounts, and diffs dramatically and consequently improve performance significantly.