A compiler tool to predict memory hierarchy performance of scientific codes

Authors:
B. B. Fraguela;R. Doallo;J. Touriño;E. L. Zapata
Affiliations:
Depto. de Electrónica e Sistemas, Facultade de Informática, Universidade da Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Depto. de Electrónica e Sistemas, Facultade de Informática, Universidade da Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Depto. de Electrónica e Sistemas, Facultade de Informática, Universidade da Coruña, Campus de Elviña s/n, 15071 A Coruña, Spain;Depto. de Arquitectura de Computadores, Complejo Tecnológico Campus de Teatinos, Universidad de Málaga, 29080 Málaga, Spain
Venue:
Parallel Computing
Year:
2004

Citing 19
Cited 8

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace-driven memory simulation: a survey

ACM Computing Surveys (CSUR)
Modeling set associative caches behavior for irregular computations

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Analytical Modeling of Set-Associative Cache Behavior

IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior

ACM Transactions on Programming Languages and Systems (TOPLAS)
Symbolic Cache Analysis for Real-Time Systems

Real-Time Systems - Special issue on worst-case execution-time analysis
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Cache Profiling and the SPEC Benchmarks: A Case Study

Computer
Parallel Programming with Polaris

Computer
Analyzing Data Locality in Numeric Applications

IEEE Micro
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes

IEEE Transactions on Computers
On Estimating and Enhancing Cache Effectiveness

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A Comparison of Compiler Tiling Algorithms

CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compile-time performance prediction of scientific programs

Compile-time performance prediction of scientific programs

Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods

Journal of Computational Physics
Analytical modeling of codes with arbitrary data-dependent conditional structures

Journal of Systems Architecture: the EUROMICRO Journal
Cache-aware iteration space partitioning

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Automatic analysis for managing and optimizing performance-code quality

Proceedings of the 2008 workshop on Static analysis
Cache-aware partitioning of multi-dimensional iteration spaces

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Accurate prediction of the behavior of multithreaded applications in shared caches

Parallel Computing
Address independent estimation of the boundaries of cache performance

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The study and understanding of memory hierarchy behavior is essential, as it is critical to current systems performance. The design of optimising environments and compilers, which allow the guidance of program transformation applications in order to improve cache performance with as little user intervention as possible, is particularly interesting. In this paper we introduce a fast analytical modelling technique that is suitable for arbitrary set-associative caches with LRU replacement policy, which overcomes weak points of other approaches found in the literature. The model was integrated in the Polaris parallelizing compiler, to allow automated analysis of loop-oriented scientific codes and to drive code optimizations. Results from detailed validations using well-known benchmarks show that the model predictions are usually very accurate and that the code optimizations proposed by the model are always, or nearly always, optimal.