Automatic measurement of instruction cache capacity

Authors:
Kamen Yotov;Sandra Jackson;Tyler Steele;Keshav Pingali;Paul Stodghill
Affiliations:
Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY
Venue:
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Year:
2005

Citing 7
Cited 3

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Measuring Cache and TLB Performance and Their Effect of Benchmark Run

Measuring Cache and TLB Performance and Their Effect of Benchmark Run
Automatic measurement of memory hierarchy parameters

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
X-Ray: A Tool for Automatic Measurement of Hardware Parameters

QEST '05 Proceedings of the Second International Conference on the Quantitative Evaluation of Systems
mhz: anatomy of a micro-benchmark

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

Investigating Cache Parameters of x86 Family Processors

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
BlackjackBench: portable hardware characterization

Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
BlackjackBench: portable hardware characterization

ACM SIGMETRICS Performance Evaluation Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is growing interest in autonomic computing systems that can optimize their own behavior on different platforms without manual intervention. Examples of successful self-optimizing systems are ATLAS, which generates Basic Linear Algebra Subroutine (BLAS) Libraries, and FFTW, which generates FFT libraries. Self-optimizing systems may need the values of hardware parameters such as the number of registers of various types and the capacities of caches at various levels. For example, ATLAS uses the capacity of the L1 cache and the number of registers in determining the size of cache tiles and register tiles. We have built a system called X-Ray, which uses micro-benchmarks to measure such parameter values automatically. The micro-benchmarks currently implemented in X-Ray can determine the latency of various instructions, the existence of important instructions like fused multiply-add, the number of registers of various kinds, and parameters of the memory hierarchy. In this paper, we discuss how X-Ray determines the capacity of the instruction cache (I-cache), which is needed for important optimizations such as loop unrolling. We present the micro-benchmark used in X-Ray to measure I-cache capacity, the experimental methodology used to obtain accurate estimates, and experimental results on a large number of current platforms.