Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications

Authors:
Siddhartha V. Tambat;Sriram Vajapeyam
Affiliations:
-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 10
Cited 1

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Mermera: non-coherent distributed shared memory for parallel computing

Mermera: non-coherent distributed shared memory for parallel computing
Warp control: a dynamically stable congestion protocol and its analysis

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Temporal notions of synchronization and consistency in Beehive

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Distributed Shared Memory: Concepts and Systems

Distributed Shared Memory: Concepts and Systems
Parallel Implementations of Probabilistic Inference

Computer
Program-Level Control of Network Delay for Parallel Asynchronous Iterative Applications

HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)

Edge chasing delayed consistency: pushing the limits of weak memory models

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software distributed shared memory (DSM) platforms on networks of workstations tolerate large network latencies by employing one of several weak memory consistency models. Data-race tolerant applications, such as Genetic Algorithms (GAs), Probabilistic Inference, etc., offer an additional degree of freedom to tolerate network latency: they do not synchronize shared memory references, and behave correctly when supplied outdated shared data. However, these algorithms often have a high communication-to-computation ratio and can flood the network with messages in the presence of large message delays. We study the performance of controlled asynchronous implementations of these algorithms via the use of our previously proposed blocking Global Read memory access primitive. Global Read implements non-strict cache coherence by guaranteeing to return to the reader a shared datum value from within a specified staleness range. Experiments on an IBM SP2 multicomputer with an Ethernet show significant performance improvements for controlled asynchronous implementations. On a lightly loaded Ethernet network, most of the GA benchmarks see 30% to 40% improvement over the best competitor for 2 to 16 processors, while two of the Probabilistic Inference benchmarks see more than 80% improvement for two processors. As the network, load in-creases, the benefits of non-strict cache coherence increase significantly.