Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Annual review of computer science vol. 1, 1986
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
A variable instruction stream extension to the VLIW architecture
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Processor coupling: integrating compile time and runtime scheduling for parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Hi-index | 0.00 |
We study the problem of scheduling a parallel computation so as to minimize the maximum number of data items extant at any point in the execution. Computations are expressed as directed graphs, where nodes represent primitive operations and arcs represent data dependences. The result of an operation is extant after the operation executes and until all immediate successors have begun execution. Our goal is to schedule computations so as to minimize both the maximum space required for extant data and the overall completion time.The classical problem of multiprocessor scheduling with precedence constraints is a special case of our problem, obtained by disregarding the data-space constraint. This special case is NP-complete for general graphs; a time-optimal multiprocessor scheduling algorithm is known only for the class of arbitrary trees. For this same class of arbitrary trees we present a multiprocesssor scheduling algorithm where the completion time is optimal within a constant factor, while the data-space size exceeds the optimal by a factor not greater than the number of processors.For an arbitrary n-node precedence tree T of in-degree Δ, we present: (1) an algorithm for evaluating the lower bound on the size of data space required for executing T, regardless of the completion time or number of processors; (2) a proof that the lower bound of Part 1 may be as large as (Δ - 1) lg n but not larger; (3) a single-processor schedule that executes T in time that equals the optimal, while creating the data space of size equal to the lower bound of Part 1; (4) an ω-processor schedule that executes T in time not exceeding three times the optimal, while creating the data space of size that exceeds the lower bound of Part 1 by a factor not greater than ω. (5) a proof that for every number of processors ω and for every 0 ε ≤ 1, there exist infinitely many trees such that every ω-processor schedule that executes any of these trees in time not exceeding (2 - ε) times the optimal requires a token space as large as that created by the schedule of Part 4, while the schedule of Part 4 executes every such tree in optimal time.The family of complete binary trees provides an example where our schedule achieves an exponential improvement in the size of the data space, compared to that of the classical time-optimal schedule.