ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluation of release consistent software distributed shared memory on emerging network technology
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Evaluating the performance of multithreading and prefetching in multiprocessors
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Sparks: coherence as an abstract type
Sparks: coherence as an abstract type
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Monitoring shared virtual memory performance on a Myrinet-based PC cluster
ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters
ICS '98 Proceedings of the 12th international conference on Supercomputing
Data prefetching for software DSMs
ICS '98 Proceedings of the 12th international conference on Supercomputing
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Shared virtual memory with automatic update support
ICS '99 Proceedings of the 13th international conference on Supercomputing
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
Scalable fault-tolerant distributed shared memory
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
P-RIO: A Modular Parallel-Programming Environment
IEEE Concurrency
Lazy Garbage Collection of Recovery State for Fault-Tolerant Distributed Shared Memory
IEEE Transactions on Parallel and Distributed Systems
The Affinity Entry Consistency Protocol
ICPP '97 Proceedings of the international Conference on Parallel Processing
A Comparison of Two Strategies of Dynamic Data Prefetching in Software DSM
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Journal of Parallel and Distributed Computing
Log-based rollback recovery without checkpoints of shared memory in software DSM
The Journal of Supercomputing
Addressing a workload characterization study to the design of consistency protocols
The Journal of Supercomputing
CycleMeter: detecting fraudulent peers in internet cycle sharing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Region-Based Prefetch Techniques for Software Distributed Shared Memory Systems
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Optimizing a shared virtual memory system for a heterogeneous CPU-accelerator platform
ACM SIGOPS Operating Systems Review
Searching an optimal history size for history-based page prefetching on software DSM systems
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Hi-index | 0.00 |
In this paper we propose the use of a PCI-based programmable protocol controller for hiding communication and coherence overheads in software DSMs. Our protocol controller provides three different types of overhead tolerance: a) moving basic communication and coherence tasks away from computation processors; b) prefetching of diffs; and c) generating and applying diffs with hardware assistance. We evaluate the isolated and combined impact of these features on the performance of TreadMarks. We also compare performance against two versions of the Shrimp-based AURC protocol. Using detailed execution-driven simulations of a 16-node network of workstations, we show that the greatest performance benefits provided by our protocol controller come from our hardware-supported diffs. Reducing the burden of communication and coherence transactions on the computation processor is also beneficial but to a smaller extent. Prefetching is not always profitable. Our results show that our protocol controller can improve running time performance by up to 50% for TreadMarks, which means that it can double the TreadMarks speedups. The overlapping implementation of TreadMarks performs as well or better than AURC for 5 of our 6 applications. We conclude that the simple hardware support we propose allows for the implementation of high-performance software DSMs at low cost. Based on this conclusion, we are building the NCP2 parallel system at COPPE/UFRJ.