Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
Mermera: non-coherent distributed shared memory for parallel computing
Mermera: non-coherent distributed shared memory for parallel computing
Warp control: a dynamically stable congestion protocol and its analysis
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Temporal notions of synchronization and consistency in Beehive
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Distributed Shared Memory: Concepts and Systems
Distributed Shared Memory: Concepts and Systems
Program-Level Control of Network Delay for Parallel Asynchronous Iterative Applications
HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Edge chasing delayed consistency: pushing the limits of weak memory models
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Hi-index | 0.00 |
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large network latencies by employing one of several weak memory consistency models. Data-race tolerant applications, such as Genetic Algorithms (GAs), Probabilistic Inference, etc., offer an additional degree of freedom to tolerate network latency: they do not synchronize shared memory references, and behave correctly when supplied outdated shared data. However, these algorithms often have a high communication-to-computation ratio and can flood the network with messages in the presence of large message delays. We study the performance of controlled asynchronous implementations of these algorithms via the use of our previously proposed blocking Global Read memory access primitive. Global Read implements non-strict cache coherence by guaranteeing to return to the reader a shared datum value from within a specified staleness range. Experiments on an IBM SP2 multicomputer with an Ethernet show significant performance improvements for controlled asynchronous implementations. On a lightly loaded Ethernet network, most of the GA benchmarks see 30% to 40% improvement over the best competitor for 2 to 16 processors, while two of the Probabilistic Inference benchmarks see more than 80% improvement for two processors. As the network, load in-creases, the benefits of non-strict cache coherence increase significantly.