Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

Authors:
Meng-Ju Wu;Donald Yeung
Affiliations:
University of Maryland at College Park;University of Maryland at College Park
Venue:
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Year:
2012

Citing 14
Cited 2

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploring the Design Space of Future CMPs

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Miss Rate Prediction across All Program Inputs

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Using Pin as a memory reference generator for multiprocessor simulation

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors

ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating multicore reuse distance analysis with sampling and parallelization

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Towards architecture independent metrics for multicore performance analysis

ACM SIGMETRICS Performance Evaluation Review
Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Linear-time Modeling of Program Working Set in Shared Cache

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

HOTL: a higher order theory of locality

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies employed in modern CPUs. In today's hierarchies, performance is determined by complicated thread interactions, such as interference in shared caches and replication and communication in private caches. Researchers normally perform extensive simulations to study these interactions, but this can be costly and not very insightful. An alternative is multicore reuse distance (RD) analysis, which can provide extremely rich information about multicore memory behavior. In this paper, we apply multicore RD analysis to better understand cache system design. We focus on loop-based parallel programs, an important class of programs for which RD analysis provides high accuracy. We propose a novel framework to identify optimal multicore cache hierarchies, and extract several new insights. We also characterize how the optimal cache hierarchies vary with core count and problem size.