Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Measuring Cache and TLB Performance and Their Effect of Benchmark Run
Measuring Cache and TLB Performance and Their Effect of Benchmark Run
Automatic measurement of memory hierarchy parameters
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
X-Ray: A Tool for Automatic Measurement of Hardware Parameters
QEST '05 Proceedings of the Second International Conference on the Quantitative Evaluation of Systems
mhz: anatomy of a micro-benchmark
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Investigating Cache Parameters of x86 Family Processors
Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
BlackjackBench: portable hardware characterization
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
BlackjackBench: portable hardware characterization
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
There is growing interest in autonomic computing systems that can optimize their own behavior on different platforms without manual intervention. Examples of successful self-optimizing systems are ATLAS, which generates Basic Linear Algebra Subroutine (BLAS) Libraries, and FFTW, which generates FFT libraries. Self-optimizing systems may need the values of hardware parameters such as the number of registers of various types and the capacities of caches at various levels. For example, ATLAS uses the capacity of the L1 cache and the number of registers in determining the size of cache tiles and register tiles. We have built a system called X-Ray, which uses micro-benchmarks to measure such parameter values automatically. The micro-benchmarks currently implemented in X-Ray can determine the latency of various instructions, the existence of important instructions like fused multiply-add, the number of registers of various kinds, and parameters of the memory hierarchy. In this paper, we discuss how X-Ray determines the capacity of the instruction cache (I-cache), which is needed for important optimizations such as loop unrolling. We present the micro-benchmark used in X-Ray to measure I-cache capacity, the experimental methodology used to obtain accurate estimates, and experimental results on a large number of current platforms.