on Parallel MIMD computation: HEP supercomputer and its applications
on Parallel MIMD computation: HEP supercomputer and its applications
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronizing processors through memory requests in a tightly coupled multiprocessor
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Parallel Programming and Compilers
Parallel Programming and Compilers
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GTS: Extracting Full Parallelism Out of DO Loops
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume II: Parallel Languages
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs
IEEE Transactions on Parallel and Distributed Systems
Effects of Loop Fusion and Statement Migration on the Speedup of Vector Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
In this paper we present a new method for extracting the maximum parallelism or vector operations out of DO loops with tight recurrences using sequential programming languages. We have named the method Graph Traverse Scheduling (GTS). It is devised to produce code for shared memory multiprocessors or vector machines. When parallelizing, hardware support for fast synchronization is assumed.The method is presented for single nested loops including one or several recurrences and we show how parallel and vector code is generated. Based on the dependence graph of a loop, we first evaluate its parallelism and vector length of statements. Then we apply GTS to distribute loop iterations between tasks or to generate vector operations of a given length. When this method is applied for parallel code generation, dependencies not included in the sequential execution of each task must be explicitly synchronized. A method to minimize the number of explicit synchronizations is also presented. We also present how to compute the synchronization-free parallelism obtaining fully independent tasks.When GTS is applied for vector code generation, a sequential loop of vector operations is obtained.