Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
Proceedings of the 33rd annual international symposium on Computer Architecture
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Corona: System Implications of Emerging Nanophotonic Technology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Journal of Parallel and Distributed Computing
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Direct coherence: bringing together performance and scalability in shared-memory multiprocessors
HiPC'07 Proceedings of the 14th international conference on High performance computing
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An adaptive cache coherence protocol for chip multiprocessors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
A novel lightweight directory architecture for scalable shared-memory multiprocessors
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
An optimized multicore cache coherence design for exploiting communication locality
Proceedings of the great lakes symposium on VLSI
Using in-flight chains to build a scalable cache coherence protocol
ACM Transactions on Architecture and Code Optimization (TACO)
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors
International Journal of Parallel Programming
Hi-index | 0.00 |
This paper advocates that cache coherence protocols use a bandwidth adaptive approach to adjust to varied system configurations (e.g., number of processors) and workload behaviors. We propose Bandwidth Adaptive Snooping Hybrid (BASH), a hybrid protocol that ranges from behaving like snooping (by broadcasting requests) when excess bandwidth is available to behaving like a directory protocol (by unicasting requests) when bandwidth is limited. BASH adapts dynamically by probabilistically deciding to broadcast or unicast on a per request basis using a local estimate of recent interconnection network utilization. Simulations of a microbenchmark and commercial and scientific workloads show that BASH robustly performs as well or better than the best of snooping and directory protocols as available bandwidth is varied. By mixing broadcasts and unicasts, BASH outperforms both snooping and directory protocols in the mid-range where a static choice of either is inefficient.