Design and Implementation of an Efficient Thread Partitioning Algorithm

  • Authors:
  • José N. Amaral;Guang R. Gao;Erturk Dogan Kocalar;Patrick O'Neill;Xinan Tang

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of fine-grain multi-threaded program execution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A successful algorithm must produce a thread partition that best utilizes multiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative affinity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surprisingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed algorithm and our experiments, performed on a wide range of examples, have demonstrated its efficiency and effectiveness.