Speculative precomputation: long-range prefetching of delinquent loads

  • Authors:
  • Jamison D. Collins;Hong Wang;Dean M. Tullsen;Christopher Hughes;Yong-Fong Lee;Dan Lavery;John P. Shen

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA;Microprocessor Research Lab, Intel Corporation, Santa Clara, CA;Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA;Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL;Microcomputer Software Lab, Intel Corporation, Santa Clara, CA;Microcomputer Software Lab, Intel Corporation, Santa Clara, CA;Microprocessor Research Lab, Intel Corporation, Santa Clara, CA

  • Venue:
  • ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
  • Year:
  • 2001

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper explores Speculative Precomputation, a technique that uses idle thread context in a multithreaded architecture to improve performance of single-threaded applications. It attacks program stalls from data cache misses by pre-computing future memory accesses in available thread contexts, and prefetching these data. This technique is evaluated by simulating the performance of a research processor based on the Itanium™ ISA supporting Simultaneous Multithreading. Two primary forms of Speculative Precomputation are evaluated. If only the non-speculative thread spawns speculative threads, performance gains of up to 30% are achieved when assuming ideal hardware. However, this speedup drops considerably with more realistic hardware assumptions. Permitting speculative threads to directly spawn additional speculative threads reduces the overhead associated with spawning threads and enables significantly more aggressive speculation, overcoming this limitation. Even with realistic costs for spawning threads, speedups as high as 169% are achieved, with an average speedup of 76%.