Automatic measurement of instruction cache capacity

  • Authors:
  • Kamen Yotov;Sandra Jackson;Tyler Steele;Keshav Pingali;Paul Stodghill

  • Affiliations:
  • Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY;Department of Computer Science, Cornell University, Ithaca, NY

  • Venue:
  • LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is growing interest in autonomic computing systems that can optimize their own behavior on different platforms without manual intervention. Examples of successful self-optimizing systems are ATLAS, which generates Basic Linear Algebra Subroutine (BLAS) Libraries, and FFTW, which generates FFT libraries. Self-optimizing systems may need the values of hardware parameters such as the number of registers of various types and the capacities of caches at various levels. For example, ATLAS uses the capacity of the L1 cache and the number of registers in determining the size of cache tiles and register tiles. We have built a system called X-Ray, which uses micro-benchmarks to measure such parameter values automatically. The micro-benchmarks currently implemented in X-Ray can determine the latency of various instructions, the existence of important instructions like fused multiply-add, the number of registers of various kinds, and parameters of the memory hierarchy. In this paper, we discuss how X-Ray determines the capacity of the instruction cache (I-cache), which is needed for important optimizations such as loop unrolling. We present the micro-benchmark used in X-Ray to measure I-cache capacity, the experimental methodology used to obtain accurate estimates, and experimental results on a large number of current platforms.