On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures

Authors:
Fadi N. Sibai
Affiliations:
College of Information Technology, UAE University, Al Ain, United Arab Emirates
Venue:
Microprocessors & Microsystems
Year:
2008

Citing 18
Cited 2

The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Single-Chip Multiprocessor

Computer
The impact of shared-cache clustering in small-scale shared-memory multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Best of Both Latency and Throughput

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Heterogeneous Chip Multiprocessors

Computer
Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors

IEEE Computer Architecture Letters
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Holistic design for multi-core architectures

Holistic design for multi-core architectures

Impact of level-2 cache sharing on the performance and power requirements of homogeneous multicore embedded systems

Microprocessors & Microsystems
Improving cache locking performance of modern embedded systems via the addition of a miss table at the L2 cache level

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The benefits and deficiencies of shared and private caches have been identified by researchers. The performance impact of privatizing or sharing caches on homogeneous multi-core architectures is less understood. This paper investigates the performance impact of cache sharing on a homogeneous same-ISA 16-core processor with private first-level (L1) caches by considering 3 cache models which vary the sharing property of second-level (L2) and third-level (L3) cache banks. It is observed that across many scenarios, the cache privatization's average memory access time improved as the L1 cache miss rate increased and/or the cross-partition interconnect latencies increased. Under uniform memory address distribution, and when the L3 cache miss rate is close to 0, privatizing both L2s and L3s performs best among the 3 cache models. Furthermore, we mathematically demonstrate that when the interconnect's bridge latency is below 264 cycles, privatizing L2 caches beats privatizing both L2 and L3 caches, while the reverse is true for large bridge latencies representing high-traffic and heavy workload applications. For large interconnect delays, the private L2 and L3 model is best. For low to moderate interconnect latencies, and when the L3 miss rate is not close to 0, sharing both L2 and L3 banks among all cores performs best followed by privatizing L2s, while privatizing both L2s and L3s ranks last. Under worst case address distributions, cache privatizing benefits generally increase, and with large bridge latencies, privatizing L2 and L3 banks outperforms the other cache models. This reveals that as application workloads become heavier with time, resulting in large cache miss rates and long bridge and interconnect delays, privatizing L2 and L3 caches may prove beneficial. Under less stressful workloads, sharing both L2 and L3 caches have the upper hand. This study confirms the desired configurability and flexibility of the cache memory's sharing degree based on the running workload.