Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Authors:
Blas A. Cuesta;Alberto Ros;María E. Gómez;Antonio Robles;José F. Duato
Affiliations:
Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain;Universitat Politècnica de València, Valencia, Spain
Venue:
Proceedings of the 38th annual international symposium on Computer architecture
Year:
2011

Citing 21
Cited 16

Cache coherence directories for scalable multiprocessors

Cache coherence directories for scalable multiprocessors
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An empirical evaluation of two memory-efficient directory methods

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors

Proceedings of the 2002 international symposium on Low power electronics and design
Simics: A Full System Simulation Platform

Computer
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
TMA: a trap-based memory architecture

Proceedings of the 20th annual international conference on Supercomputing
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A case for low-complexity MP architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
IBM POWER6 microarchitecture

IBM Journal of Research and Development
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor

IEEE Micro
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques

End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Practically private: enabling high performance CMPs through compiler-assisted data classification

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Complexity-effective multicore coherence

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
PS-Dir: a scalable two-level directory cache

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Detecting sharing patterns in industrial parallel applications for embedded heterogeneous multicore systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Spatiotemporal Coherence Tracking

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A new perspective for efficient virtual-cache coherence

Proceedings of the 40th Annual International Symposium on Computer Architecture
Non-race concurrency bug detection through order-sensitive critical sections

Proceedings of the 40th Annual International Symposium on Computer Architecture
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
The case for a scalable coherence protocol for complex on-chip cache hierarchies in many core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
PS-cache: an energy-efficient cache design for chip multiprocessors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Towards efficient dynamic LLC home bank mapping with noc-level support

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Multi-grain coherence directories

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DP&TB: a coherence filtering protocol for many-core chip multiprocessors

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To meet the demand for more powerful high-performance shared-memory servers, multiprocessor systems must incorporate efficient and scalable cache coherence protocols, such as those based on directory caches. However, the limited directory cache size of the increasingly larger systems may cause frequent evictions of directory entries and, consequently, invalidations of cached blocks, which severely degrades system performance. A significant percentage of the referred memory blocks are only accessed by one processor (even in parallel applications) and, therefore, do not require coherence maintenance. Taking advantage of techniques that dynamically identify those private blocks, we propose to deactivate the coherence protocol for them and to treat them as uniprocessor systems do. The protocol deactivation allows directory caches to omit the tracking of an appreciable quantity of blocks, which reduces their load and increases their effective size. Since the operating system collaborates on the detection of private blocks, our proposal only requires minor modifications. Simulation results show that, thanks to our proposal, directory caches can avoid the tracking of about 57% of the accessed blocks and their capacity can be better exploited. This contributes either to shorten the runtime of parallel applications by 15% while keeping directory cache size or to maintain system performance while using directory caches 8 times smaller.