Memory Latency-Tolerance Approaches for Itanium Processors: Out-of-Order Execution vs.Speculative Precomputation

  • Authors:
  • Ralph M. Kling

  • Affiliations:
  • -

  • Venue:
  • HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

The performance of in-order execution Itanium(tm) processors can suffer signi ?cantly due to cache misses. Two memory latency tolerance approaches can be applied for the Itanium processors. One uses an out-of-order (OOO) execution core; the other assumes multithreading support and exploits cache prefetching via speculative precomputation (SP). This paper evaluates and contrasts these two approaches. In addition,this paper assesses the effectiveness of combining the two approaches. For a select set of memory-intensive programs, an in-order SMT Itanium processor using speculative precomputation can achieve per formance improvement (92%)comparable to that of an out of-order design (87%). Applying both OOO and SP yields a total performance improvement of 141%over the baseline in order machine. OOO tends to be effective in prefetching for L1 misses;whereas SP is primarily good at cover ing L2 and L3 misses. Our analysis indicates that the two approaches can be redundant or complementary depending on the type of delinquent loads that each targets. Both approaches are effective on delinquent loads in the loop body; however only SP is effective on delinquent loads found in loop control code.