Distributed cooperative caching

Authors:
Enric Herrero;José González;Ramon Canal
Affiliations:
Universitat Politècnica de Catalunya, Barcelona, Spain;Intel Barcelona Research Center, Barcelona, Spain;Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Year:
2008

Citing 18
Cited 5

The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Simics: A Full System Simulation Platform

Computer
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Energy characterization of a tiled architecture processor with on-chip networks

Proceedings of the 2003 international symposium on Low power electronics and design
A New Scalable Directory Architecture for Large-Scale Multiprocessors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Power/Performance/Thermal Design-Space Exploration for Multicore Architectures

IEEE Transactions on Parallel and Distributed Systems

Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Proceedings of the 37th annual international symposium on Computer architecture
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Power-efficient spilling techniques for chip multiprocessors

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
The impact of solid state drive on search engine cache management

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the Distributed Cooperative Caching, a scalable and energy-efficient scheme to manage chip multiprocessor (CMP) cache resources. The proposed configuration is based in the Cooperative Caching framework [3] but it is intended for large scale CMPs. Both centralized and distributed configurations have the advantage of combining the benefits of private and shared caches. In our proposal, the Coherence Engine has been redesigned to allow its partitioning and thus, eliminate the size constraints imposed by the duplication of all tags. At the same time, a global replacement mechanism has been added to improve the usage of cache space. Our framework uses several Distributed Coherence Engines spread across all the nodes to improve scalability. The distribution permits a better balance of the network traffic over the entire chip avoiding bottlenecks and increasing performance for a 32-core CMP by 21% over a traditional shared memory configuration and by 57% over the Cooperative Caching scheme. Furthermore, we have reduced the power consumption of the entire system by using a different tag allocation method and by reducing the number of tags compared on each request. For a 32-core CMP the Distributed Cooperative Caching framework provides an average improvement of the power/performance relation (MIPS3/W) of 3.66x over a traditional shared memory configuration and 4.30x over Cooperative Caching.