CMP cache performance projection: accessibility vs. capacity

Authors:
Xudong Shi;Feiqi Su;Jih-kwon Peir;Ye Xia;Zhen Yang
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
ACM SIGARCH Computer Architecture News
Year:
2007

Citing 24
Cited 2

An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Implementing stack simulation for highly-associative memories

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Set-associative cache simulation using generalized binomial trees

ACM Transactions on Computer Systems (TOCS)
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Stack Evaluation of Arbitrary Set-Associative Multiprocessor Caches

IEEE Transactions on Parallel and Distributed Systems
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Efficient Analysis of Caching Systems

Efficient Analysis of Caching Systems
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient Stack Simulation for Shared Memory Set-Associative Multiprocessor Caches

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient utilizing on-chip storage space on Chip-Multiprocessors (CMPs) has become an important research topic. Tradeoffs between data accessibility and effective on-chip capacity have been studied extensively. It requires costly simulations to understand a wide-spectrum of the design space. In this paper, we first develop an abstract model for understanding the performance impact with respect to data replication. To overcome the lack of real-time interactions among multiple cores in the abstract model, we propose a global stack simulation strategy to study the performance of a variety of cache organizations on CMPs. The global stack logically incorporates a shared stack and per-core private stacks to collect shared/private reuse (stack) distances for every memory reference in a single simulation pass. With the collected reuse distances, performance in terms of hits/misses and average memory access times can be calculated for various cache organizations. We verify the stack results against individual execution-driven simulations that consider realistic cache parameters and delays using a set of commercial multithreaded workloads. The results show that stack simulations can accurately model the performance of various cache organizations. The single-pass stack simulation results demonstrate that the effectiveness of various techniques for optimizing the CMP on-chip storage is closely related to the working sets of the workloads as well as to the total cache sizes.