Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
StaCS: a Static Control Superscalar architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Loop optimization for horizontal microcoded machines
ICS '90 Proceedings of the 4th international conference on Supercomputing
Overview of a high-performance programmable pipeline structure
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Vector register design for polycyclic vector scheduling
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Effects of Loop Fusion and Statement Migration on the Speedup of Vector Multiprocessors
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
Compile time scheduling of vector activities on the CRAY 21 is studied using a simplified model of the vector instruction stream. Due to several of the hardware characteristics of the machine, an approach using much know-how obtained on Array-Processor micro-code scheduling by the authors is shown practical. It calls for a pass of loop scheduling followed by a pass of resource allocation. Actual benchmarks of the resulting code are shown, exhibiting speed-ups as large as 50% over the current CFT77 compiler. Our results also give a new perspective in the comparison of vector chaining and non-chaining processor architectures.