Squeezing more CPU performance out of a Cray-2 by Vector block scheduling

Authors:
C. Eisenbeis;W. Jalby;A. Lichnewsky
Affiliations:
I.N.R.T.A.;Domaine de Voluceau;78153 Le Chesnay CEDEX
Venue:
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Year:
1988

Citing 3
Cited 8

Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Multi-threaded vectorization

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
StaCS: a Static Control Superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Loop optimization for horizontal microcoded machines

ICS '90 Proceedings of the 4th international conference on Supercomputing
Overview of a high-performance programmable pipeline structure

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Vector register design for polycyclic vector scheduling

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Effects of Loop Fusion and Statement Migration on the Speedup of Vector Multiprocessors

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compile time scheduling of vector activities on the CRAY 21 is studied using a simplified model of the vector instruction stream. Due to several of the hardware characteristics of the machine, an approach using much know-how obtained on Array-Processor micro-code scheduling by the authors is shown practical. It calls for a pass of loop scheduling followed by a pass of resource allocation. Actual benchmarks of the resulting code are shown, exhibiting speed-ups as large as 50% over the current CFT77 compiler. Our results also give a new perspective in the comparison of vector chaining and non-chaining processor architectures.