CMP cache performance projection: accessibility vs. capacity

  • Authors:
  • Xudong Shi;Feiqi Su;Jih-kwon Peir;Ye Xia;Zhen Yang

  • Affiliations:
  • University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient utilizing on-chip storage space on Chip-Multiprocessors (CMPs) has become an important research topic. Tradeoffs between data accessibility and effective on-chip capacity have been studied extensively. It requires costly simulations to understand a wide-spectrum of the design space. In this paper, we first develop an abstract model for understanding the performance impact with respect to data replication. To overcome the lack of real-time interactions among multiple cores in the abstract model, we propose a global stack simulation strategy to study the performance of a variety of cache organizations on CMPs. The global stack logically incorporates a shared stack and per-core private stacks to collect shared/private reuse (stack) distances for every memory reference in a single simulation pass. With the collected reuse distances, performance in terms of hits/misses and average memory access times can be calculated for various cache organizations. We verify the stack results against individual execution-driven simulations that consider realistic cache parameters and delays using a set of commercial multithreaded workloads. The results show that stack simulations can accurately model the performance of various cache organizations. The single-pass stack simulation results demonstrate that the effectiveness of various techniques for optimizing the CMP on-chip storage is closely related to the working sets of the workloads as well as to the total cache sizes.