All-window profiling and composable models of cache sharing

Authors:
Xiaoya Xiang;Bin Bao;Tongxin Bai;Chen Ding;Trishul Chilimbi
Affiliations:
University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;University of Rochester, Rochester, NY, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Year:
2011

Citing 28
Cited 8

Footprints in the cache

ACM Transactions on Computer Systems (TOCS)
Cache performance of operating system and multiprogramming workloads

ACM Transactions on Computer Systems (TOCS)
Evaluating Associativity in CPU Caches

IEEE Transactions on Computers
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Flexible reference trace reduction for VM simulations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Owl: next generation system monitoring

Proceedings of the 2nd conference on Computing frontiers
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
Multiple Page Size Modeling and Optimization

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
StatCache: a probabilistic approach to efficient and accurate data locality analysis

ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
All-window profiling of concurrent executions

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Scalable Implementation of Efficient Locality Approximation

Languages and Compilers for Parallel Computing
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluation techniques for storage hierarchies

IBM Systems Journal
LRU stack processing

IBM Journal of Research and Development
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Static reuse distances for locality-based optimizations in MATLAB

Proceedings of the 24th ACM International Conference on Supercomputing
Accelerating multicore reuse distance analysis with sampling and parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Discovery of locality-improving refactorings by reuse path analysis

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

How to fit program footprint curves

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Cache Conscious Task Regrouping on Multicore Processors

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Automatic OpenCL work-group size selection for multicore CPUs

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As multi-core processors become commonplace and cloud computing is gaining acceptance, more applications are run in a shared cache environment. Cache sharing depends on a concept called footprint, which depends on all cache accesses not just cache misses. Previous work has recognized the importance of footprint but has not provided a method for accurate measurement, mainly because the complete measurement requires counting data access in all execution windows, which takes time quadratic in the length of a trace. The paper first presents an algorithm efficient enough for off-line use to approximately measure the footprint with a guaranteed precision. The cost of the analysis can be adjusted by changing the precision. Then the paper presents a composable model. For a set of programs, the model uses the all-window footprint of each program to predict its cache interference with other programs without running these programs together. The paper evaluates the efficiency of all-window profiling using the SPEC 2000 benchmarks and compares the footprint interference model with a miss-rate based model and with exhaustive testing.