The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Computer
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Best of Both Latency and Throughput
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Heterogeneous Chip Multiprocessors
Computer
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors
IEEE Computer Architecture Letters
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Holistic design for multi-core architectures
Holistic design for multi-core architectures
Microprocessors & Microsystems
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
The benefits and deficiencies of shared and private caches have been identified by researchers. The performance impact of privatizing or sharing caches on homogeneous multi-core architectures is less understood. This paper investigates the performance impact of cache sharing on a homogeneous same-ISA 16-core processor with private first-level (L1) caches by considering 3 cache models which vary the sharing property of second-level (L2) and third-level (L3) cache banks. It is observed that across many scenarios, the cache privatization's average memory access time improved as the L1 cache miss rate increased and/or the cross-partition interconnect latencies increased. Under uniform memory address distribution, and when the L3 cache miss rate is close to 0, privatizing both L2s and L3s performs best among the 3 cache models. Furthermore, we mathematically demonstrate that when the interconnect's bridge latency is below 264 cycles, privatizing L2 caches beats privatizing both L2 and L3 caches, while the reverse is true for large bridge latencies representing high-traffic and heavy workload applications. For large interconnect delays, the private L2 and L3 model is best. For low to moderate interconnect latencies, and when the L3 miss rate is not close to 0, sharing both L2 and L3 banks among all cores performs best followed by privatizing L2s, while privatizing both L2s and L3s ranks last. Under worst case address distributions, cache privatizing benefits generally increase, and with large bridge latencies, privatizing L2 and L3 banks outperforms the other cache models. This reveals that as application workloads become heavier with time, resulting in large cache miss rates and long bridge and interconnect delays, privatizing L2 and L3 caches may prove beneficial. Under less stressful workloads, sharing both L2 and L3 caches have the upper hand. This study confirms the desired configurability and flexibility of the cache memory's sharing degree based on the running workload.