The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
A maximally pipelined tridiagonal linear equation solver
Journal of Parallel and Distributed Computing
Annual review of computer science vol. 1, 1986
Arrays, non-determinism, side-effects, and parallelism: A functional perspective
Proc. of a workshop on Graph reduction
Proc. of a workshop on Graph reduction
Control of parallelism in the Manchester Dataflow Machine
Proc. of a conference on Functional programming languages and computer architecture
The search for performance in scientific processors: the Turing Award lecture
Communications of the ACM
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
An efficient pipelined dataflow processor architecture
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Algorithmic aspects of balancing techniques for pipelined data flow code generation
Journal of Parallel and Distributed Computing
Communications of the ACM
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Hi-index | 0.00 |
Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the fine-grain synchronization power of the machine.A detailed analysis based on simulation results is presented, focusing on two key architectural factors - the fine-grain synchronization capacity and the scheduling mechanism for enabling instructions. On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.