Accurate prediction of the behavior of multithreaded applications in shared caches

Authors:
Diego Andrade;Basilio B. Fraguela;RamóN Doallo
Affiliations:
Dept. of Electronics and Systems, University of A Coruña, Spain;Dept. of Electronics and Systems, University of A Coruña, Spain;Dept. of Electronics and Systems, University of A Coruña, Spain
Venue:
Parallel Computing
Year:
2013

Citing 19
Cited 1

Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

IEEE Transactions on Computers
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A fast and accurate framework to analyze and optimize cache memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
A compiler tool to predict memory hierarchy performance of scientific codes

Parallel Computing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Predicting Cache Space Contention in Utility Computing Servers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
L2 Cache Modeling for Scientific Applications on Chip Multi-Processors

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Static reuse distances for locality-based optimizations in MATLAB

Proceedings of the 24th ACM International Conference on Supercomputing
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Multi-Core Cache Hierarchies

Multi-Core Cache Hierarchies

Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicores are the norm nowadays and in many of them there are cores that share one or several levels of cache. The theoretical performance gain expected when several cores cooperate in the parallel execution of an application can be reduced in some cases by a cache access bottleneck, as the data accessed by them can interfere in the shared cache levels. In other cases the performance gain can be increased due to a greater reuse of the data loaded in the cache. This paper presents an analytical model that can predict the behavior of shared caches when executing applications parallelized at loop level. To the best of our knowledge, this is the first analytical model that tackles the behavior of multithreaded applications on realistic shared caches without requiring profiling. The experimental results show that the model predictions are precise and very fast and that the model can help a compiler or programmer choose the best parallelization strategy.