Edge chasing delayed consistency: pushing the limits of weak memory models

Authors:
Harold W. Cain;Mikko H. Lipasti
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Year:
2012

Citing 26
Cited 0

Memory access buffering in multiprocessors

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Deadlock detection in distributed databases

ACM Computing Surveys (CSUR)
The Wisconsin multicube: a new large-scale cache-coherent multiprocessor

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Race-free interconnection networks and multiprocessor consistency

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Delayed consistency and its effects on the miss rate of parallel programs

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Lazy release consistency for software distributed shared memory

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adjustable block size coherent caches

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Lazy caching

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using write caches to improve performance of cache coherence protocols in shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Essential misses and data traffic in coherence protocols

Journal of Parallel and Distributed Computing - Special issue on distributed shared memory systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Scalable concurrent B-trees using multi-version memory

Journal of Parallel and Distributed Computing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21364 Network Architecture

IEEE Micro
A distributed algorithm for detecting resource deadlocks in distributed systems

PODC '82 Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing
Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Aspects of cache memory and instruction buffer performance

Aspects of cache memory and instruction buffer performance
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Exploring, defining, and exploiting recent store value locality

Exploring, defining, and exploiting recent store value locality
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Detecting and exploiting causal relationships in hardware shared-memory multiprocessors

Detecting and exploiting causal relationships in hardware shared-memory multiprocessors
Cache protocols with partial block invalidations

IPPS '93 Proceedings of the 1993 Seventh International Parallel Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In shared memory multiprocessors utilizing invalidation-based coherence protocols, cache misses caused by inter-processor communication are a dominant source of processor stall cycles for many applications. We explore a novel coherence protocol implementation called edge-chasing delayed consistency (ECDC) that mitigates some of the performance degradation caused by this class of misses. Edge-chasing delayed consistency allows a processor to non-speculatively continue reading a cache line after receiving an invalidation from another core, without changing the consistency model offered to programmers. While the idea of using stale data for as long as possible is enticing, our study shows that the benefits of such delay are small, and that the majority of these delayed invalidation benefits come from mitigating the false sharing problem, rather than any tolerance of races or an application's ability to consume stale data in a productive manner.