Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers

Authors:
J. H. Tang;E. S. Davidson;J. Tong
Affiliations:
Department of Electrical Engineering, and Computer Science, University of Michigan, Ann Arbor, MI;Department of Electrical Engineering, and Computer Science, University of Michigan, Ann Arbor, MI;The Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette, Louisiana
Venue:
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Year:
1988

Citing 8
Cited 7

A close look at vector performance of register-to-register vector computers and a new model

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An evaluation of Cray X-MP performance on vectorizable Livermore FORTRAN kernels

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Squeezing more CPU performance out of a Cray-2 by Vector block scheduling

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Fundamentals of Computer Alori

Fundamentals of Computer Alori
Efficient code generation for horizontal architectures: Compiler techniques and architectural support

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
High-Bandwidth/Low Latency Temporary Storage for Supercomputers

High-Bandwidth/Low Latency Temporary Storage for Supercomputers

A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1

Computer - Special issue on experimental research in computer architecture
Multi-threaded vectorization

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Hierarchical performance modeling with MACS: a case study of the convex C-240

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Communication in the KSR1 MPP: performance evaluation using synthetic workload experiments

ICS '94 Proceedings of the 8th international conference on Supercomputing
Vector register design for polycyclic vector scheduling

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Decoupled vector architectures

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the impact of chaining and several instruction scheduling schemes on one-memory-port vector supercomputers, illustrated by the Cray-1 and Cray-2. The lack of instruction chaining in the Cray-2 vector processor requires a different instruction scheduling scheme from that of the Cray-1. Situations are characterized in which simple vector scheduling can generate optimal code, which fully utilizes at least one functional unit for machines with chaining. With enough registers polycyclic scheduling, even without chaining, guarantees full utilization of one functional unit, after an initial transient, for loops with acyclic dependence graphs. Workloads are represented by vectorizable Livermore Fortran Kernels (LFKs). The effectiveness of applying polycyclic scheduling to the Cray-2 is compared with optimal simple vector scheduling on the Cray-1. The speedup of polycyclic vector scheduling on the Cray-2 over the schedule achieved by the current CFT77 compiler on several vectorizable LFKs is also presented.