Using in-flight chains to build a scalable cache coherence protocol

Authors:
Samantika Subramaniam;Simon C. Steely;Will Hasenplaugh;Aamer Jaleel;Carl Beckmann;Tryggve Fossum;Joel Emer
Affiliations:
Intel Corporation;Intel Corporation;Intel Corporation and MIT;Intel Corporation;Intel Corporation;Intel Corporation;Intel Corporation and MIT
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 39
Cited 0

Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
The Stanford Dash Multiprocessor

Computer
Extending the scalable coherent interface for large-scale shared-memory multiprocessors

Extending the scalable coherent interface for large-scale shared-memory multiprocessors
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The GLOW cache coherence protocol extensions for widely shared data

ICS '96 Proceedings of the 10th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Two economical directory schemes for large-scale cache coherent multiprocessors

ACM SIGARCH Computer Architecture News
Asim: A Performance Model Framework

Computer
Design of an Adaptive Cache Coherence Protocol for Large Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
The Murphi Verification System

CAV '96 Proceedings of the 8th International Conference on Computer Aided Verification
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
PHD: A Hierarchical Cache Coherent Protocol

PHD: A Hierarchical Cache Coherent Protocol
SPLASH: Stanford parallel applications for shared-memory*

SPLASH: Stanford parallel applications for shared-memory*
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Cache coherence techniques for multicore processors

Cache coherence techniques for multicore processors
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A tagless coherence directory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
CLOMP: accurately characterizing OpenMP application overheads

International Journal of Parallel Programming
A Novel Directory-Based Non-busy, Non-blocking Cache Coherence

IFCSTA '09 Proceedings of the 2009 International Forum on Computer Science-Technology and Applications - Volume 01
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power

IEEE Micro
Fractal Coherence: Scalably Verifiable Cache Coherence

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Cuckoo directory: A scalable directory for many-core systems

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
The Scalable Tree Protocol-a cache coherence approach for large-scale multiprocessors

SPDP '92 Proceedings of the 1992 Fourth IEEE Symposium on Parallel and Distributed Processing
SCD: A scalable coherence directory with flexible sharer set encoding

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Why on-chip cache coherence is here to stay

Communications of the ACM
The Scalable Coherent Interface (SCI)

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

As microprocessor designs integrate more cores, scalability of cache coherence protocols becomes a challenging problem. Most directory-based protocols avoid races by using blocking tag directories that can impact the performance of parallel applications. In this article, we first quantitatively demonstrate that state-of-the-art blocking protocols significantly constrain throughput at large core counts for several parallel applications. Nonblocking protocols address this throughput concern at the expense of scalability in the interconnection network or in the required resource overheads. To address this concern, we enhance nonblocking directory protocols by migrating the point of service of responses. Our approach uses in-flight chains of cores making parallel memory requests to incorporate scalability while maintaining high-throughput. The proposed cache coherence protocol called chained cache coherence, can outperform blocking protocols by up to 20% on scientific and 12% on commercial applications. It also has low resource overheads and simple address ordering requirements making it both a high-performance and scalable protocol. Furthermore, in-flight chains provide a scalable solution to building hierarchical and nonblocking tag directories as well as optimize communication latencies.