Memory performance analysis of SPEC2000C for the Intel(R) Itanium/sup TM/ processor

  • Authors:
  • M. J. Serrano;Youfeng Wu

  • Affiliations:
  • Intel Labs., Intel Corp., Santa Clara, CA, USA;Intel Labs., Intel Corp., Santa Clara, CA, USA

  • Venue:
  • WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe our memory performance analysis of SPEC2000C using the newly released Intel(R) Itanium/sup TM/ processor (IPF). Memory overhead is very significant for SPEC200OC; on the average 39% cycles are spent in data stalls. Cache misses are significant, but also data translation performance (DTLB) affects many benchmarks. We present a study based on collecting measurements from the hardware performance counters and cache profiling using program instrumentation of loads/stores. We define important loads as the load sites that contribute at least 95% of the cache misses at all levels. Our measurements show that the number of important loads in a program is relatively small. Our analysis show that important loads are most of the time contained in inner loops, and that the trip counts of these loops is significantly high. We present preliminary results on using stride profiling to reduce cache misses of important loads, bringing an average of 6% improvement to SPEC2000C. Finally, we present our study of data translation performance and propose design choices.