Predicting Cache Space Contention in Utility Computing Servers

Authors:
Yan Solihin;Fei Guo;Seongbeom Kim
Affiliations:
North Carolina State University;North Carolina State University;North Carolina State University
Venue:
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Year:
2005

Citing 17
Cited 1

Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
An analytical cache model

ACM Transactions on Computer Systems (TOCS)
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
Performance evaluation of the SGI Origin2000: a memory-centric characterization of LANL ASCI applications

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Compile-Time Based Performance Prediction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Automatically Mapping Code on an Intelligent Memory Architecture

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Evaluation techniques for storage hierarchies

IBM Systems Journal

Accurate prediction of the behavior of multithreaded applications in shared caches

Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The need to provide performance guarantee in high performance servers has long been neglected. Providing performance guarantee in current and future servers is difficult because fine-grain resources, such as on-chip caches, are shared by multiple processors or thread contexts. Although inter-thread cache sharing generally improves the overall throughput of the system, the impact of cache contention on the threads that share it is highly non-uniform: some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Clearly, this situation is not desirable when performance guarantee needs to be provided, such as in utility computing servers. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose an inductive probability model to predict the impact of cache sharing on co-scheduled threads. The input to the model is the isolated L2 circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the model is the number of extra L2 cache misses for each thread due to cache sharing. We validate the model against a cycle-accurate simulation that implements a dual-core Chip Multi-Processor (CMP architecture), on fourteen pairs of mostly SPEC benchmarks. The model achieves an average error of only 3.9%.