Design and Implementation of an Efficient Thread Partitioning Algorithm

Authors:
José N. Amaral;Guang R. Gao;Erturk Dogan Kocalar;Patrick O'Neill;Xinan Tang
Affiliations:
-;-;-;-;-
Venue:
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Year:
2000

Citing 7
Cited 0

List scheduling with and without communication delays

Parallel Computing
Polling watchdog: combining polling and interrupts for efficient message handling

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A study of the EARTH-MANNA multithreaded system

International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques—part II
Thread partitioning and scheduling based on cost model

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Compiling C for the EARTH Multithreaded Architecture

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of fine-grain multi-threaded program execution models has created an interesting challenge: how to partition a program into threads that can exploit machine parallelism, achieve latency tolerance, and maintain reasonable locality of reference? A successful algorithm must produce a thread partition that best utilizes multiple execution units on a single processing node and handles long and unpredictable latencies. In this paper, we introduce a new thread partitioning algorithm that can meet the above challenge for a range of machine architecture models. A quantitative affinity heuristic is introduced to guide the placement of operations into threads. This heuristic addresses the trade-off between exploiting parallelism and preserving locality. The algorithm is surprisingly simple due to the use of a time-ordered event list to account for the multiple execution unit activities. We have implemented the proposed algorithm and our experiments, performed on a wide range of examples, have demonstrated its efficiency and effectiveness.