Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

  • Authors:
  • Dongkeun Kim;Steve Shih-wei Liao;Perry H. Wang;Juan del Cuvillo;Xinmin Tian;Xiang Zou;Hong Wang;Donald Yeung;Milind Girkar;John P. Shen

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-

  • Venue:
  • Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.