Memory access buffering in multiprocessors
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Deadlock detection in distributed databases
ACM Computing Surveys (CSUR)
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Delayed consistency and its effects on the miss rate of parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Lazy release consistency for software distributed shared memory
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ACM Transactions on Programming Languages and Systems (TOPLAS)
Journal of Parallel and Distributed Computing
Essential misses and data traffic in coherence protocols
Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Scalable concurrent B-trees using multi-version memory
Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
On the value locality of store instructions
Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21364 Network Architecture
IEEE Micro
A distributed algorithm for detecting resource deadlocks in distributed systems
PODC '82 Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Scalable lock-free dynamic memory allocation
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Exploring, defining, and exploiting recent store value locality
Exploring, defining, and exploiting recent store value locality
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Detecting and exploiting causal relationships in hardware shared-memory multiprocessors
Detecting and exploiting causal relationships in hardware shared-memory multiprocessors
Cache protocols with partial block invalidations
IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium
Hi-index | 0.00 |
In shared memory multiprocessors utilizing invalidation-based coherence protocols, cache misses caused by inter-processor communication are a dominant source of processor stall cycles for many applications. We explore a novel coherence protocol implementation called edge-chasing delayed consistency (ECDC) that mitigates some of the performance degradation caused by this class of misses. Edge-chasing delayed consistency allows a processor to non-speculatively continue reading a cache line after receiving an invalidation from another core, without changing the consistency model offered to programmers. While the idea of using stale data for as long as possible is enticing, our study shows that the benefits of such delay are small, and that the majority of these delayed invalidation benefits come from mitigating the false sharing problem, rather than any tolerance of races or an application's ability to consume stale data in a productive manner.