Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors

  • Authors:
  • Tanping Wang;Filip Blagojevic;Dimitrios S. Nikolopoulos

  • Affiliations:
  • The College of William and Mary, McGlothlin-Street Hall, Williamsburg VA;The College of William and Mary, McGlothlin-Street Hall, Williamsburg VA;The College of William and Mary, McGlothlin-Street Hall, Williamsburg VA

  • Venue:
  • LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents runtime mechanisms that enable flexible use of speculative precomputation in conjunction with thread-level parallelism on SMT processors. The mechanisms were implemented and evaluated on a real multi-SMT system. So far, speculative precomputation and thread-level parallelism have been used disjunctively on SMT processors and no attempts have been made to compare and possibly combine these techniques for further optimization. We present runtime support mechanisms for coordinating precomputation with its sibling computation, so that precomputation is regulated to avoid cache pollution and sufficient runahead distance is allowed from the targeted computation. We also present a task queue mechanism to orchestrate precomputation and thread-level parallelism, so that they can be used conjunctively in the same program. The mechanisms are motivated by the observation that different parts of a program may benefit from different modes of multithreaded execution. Furthermore, idle periods during TLP execution or sequential sections can be used for precomputation and vice versa. We apply the mechanisms in loop-structured scientific codes. We present experimental results that verify that no single technique (precomputation or TLP) in isolation achieves the best performance in all cases. Efficient combination of precomputation and TLP is most often the best solution.