Fast modeling of shared caches in multicore systems

Authors:
David Eklov;David Black-Schaffer;Erik Hagersten
Affiliations:
Uppsala University, Sweden;Uppsala University, Sweden;Uppsala University, Sweden
Venue:
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Year:
2011

Citing 17
Cited 8

Efficient simulation of caches under optimal replacement with applications to miss characterization

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Simics: A Full System Simulation Platform

Computer
A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

IEEE Transactions on Computers
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance of multithreaded chip multiprocessors and implications for operating system design

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
An Instruction Throughput Model of Superscalar Processors

IEEE Transactions on Computers
Sampling-based program locality approximation

Proceedings of the 7th international symposium on Memory management
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Evaluation techniques for storage hierarchies

IBM Systems Journal
Accelerating multicore reuse distance analysis with sampling and parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
StatCC: a statistical cache contention model

Proceedings of the 19th international conference on Parallel architectures and compilation techniques

On the accuracy of cache sharing models

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Phase guided profiling for fast cache modeling

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Understanding fundamental design choices in single-ISA heterogeneous multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Towards software performance engineering for multicore and manycore systems

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents StatCC, a simple and efficient model for estimating the shared cache miss ratios of co-scheduled applications on architectures with a hierarchy of private and shared caches. StatCC leverages the StatStack cache model to estimate the co-scheduled applications' cache miss ratios from their individual memory reuse distance distributions, and a simple performance model that estimates their CPIs based on the shared cache miss ratios. These methods are combined into a system of equations that explicitly models the CPIs in terms of the shared miss ratios and can be solved to determine both. The result is a fast algorithm with a 2% error across the SPEC CPU2006 benchmark suite compared to a simulated in-order processor and a hierarchy of private and shared caches.