The CRAY-1 computer system

Authors:
Richard M. Russell
Affiliations:
Cray Research, Inc., Minneapolis, MN
Venue:
Communications of the ACM - Special issue on computer architecture
Year:
1978

Citing 0
Cited 171

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Simulation Study of Decoupled Architecture Computers

IEEE Transactions on Computers
Supercomputer languages

ACM Computing Surveys (CSUR)
Optimal chaining in expression trees

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
On the use of registers vs. cache to minimize memory traffic

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Fast temporary storage for serial and parallel execution

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Organization and analysis of a gracefully-degrading interleaved memory system

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Matrix operations on a multicomputer system with switchable main memory modules and dynamic control

IEEE Transactions on Computers
Scheduling arithmetic and load operations in parallel with no spilling

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The IBM System/370 Vector Architecture: Design Considerations

IEEE Transactions on Computers
Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
Continuous Models for Communication Density Constraints on Multiprocessor Performance

IEEE Transactions on Computers
Optimal Chaining in Expression Trees

IEEE Transactions on Computers
Toward a dataflow/von Neumann hybrid architecture

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Cache performance of vector processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A Novel Technique for Efficient Parallel Implementation of a Classical Logic/Fault Simulation Problem

IEEE Transactions on Computers
A two-tier memory architecture for high-performance multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Program locality of vectorized applications running on the IBM 3090 with vector facility

IBM Systems Journal
Compiling Fortran 8x array features for the connection machine computer system

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
I-NET mechanism for issuing multiple instructions

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Polycyclic Vector scheduling vs. Chaining on 1-Port Vector supercomputers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A Historical Overview of Computer Architecture

IEEE Annals of the History of Computing
Tradeoffs in instruction format design for horizontal architectures

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Programming languages for distributed computing systems

ACM Computing Surveys (CSUR)
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Can dataflow subsume von Neumann computing?

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
On the Complexity of Scheduling Problems for Parallel/Pipelined Machines

IEEE Transactions on Computers
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Quick and easy cache performance analysis

ACM SIGARCH Computer Architecture News
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Performance from architecture: comparing a RISC and a CISC with similar hardware organization

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Epochs

ACM Transactions on Programming Languages and Systems (TOPLAS)
(SM)/sup 2/-II: A Large-Scale Multiprocessor for Sparse Matrix Calculations

IEEE Transactions on Computers
SPIRE: streaming processing with instructions release element

ACM SIGARCH Computer Architecture News
Tolerating data access latency with register preloading

ICS '92 Proceedings of the 6th international conference on Supercomputing
Benchmarking a vector-processor prototype based on multithreaded streaming/FIFO vector (MSFV) architecture

ICS '92 Proceedings of the 6th international conference on Supercomputing
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Vector Register Allocation

IEEE Transactions on Computers
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Register connection: a new approach to adding registers into instruction set architectures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The Social Limits of Speed: The Development and Use of Supercomputers

IEEE Annals of the History of Computing
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Data relocation and prefetching for programs with large data sets

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effects of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Decoupling integer execution in superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
An evaluation of functional unit lengths for single-chip processors

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A victim cache for vector registers

ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Out-of-order vector architectures

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Vector architectures: past, present and future

ICS '98 Proceedings of the 12th international conference on Supercomputing
Implementation of precise interrupts in pipelined processors

25 years of the international symposia on Computer architecture (selected papers)
Instruction issue logic for high-performance, interruptable pipelined processors

25 years of the international symposia on Computer architecture (selected papers)
The University of Manchester MU5 Project

IEEE Annals of the History of Computing
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Parallel instance discrete-event simulation using a vector uniprocessor

WSC '91 Proceedings of the 23rd conference on Winter simulation
Tutorial on parallel processing for design automation applications (tutorial session)

DAC '86 Proceedings of the 23rd ACM/IEEE Design Automation Conference
Response Time Analysis of Multiprocessor Computers for Database Support

ACM Transactions on Database Systems (TODS)
Optimal code generation for expressions on super scalar machines

ACM '86 Proceedings of 1986 ACM Fall joint computer conference
VAX vector architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PIPE: a VLSI decoupled architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
(SM)2-II: a new version of the sparse matrix solving machine

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Tagged architecture: how compelling are its advantages?

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
MOM: a matrix SIMD instruction set architecture for multimedia applications

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Pipeline Reconfigurable FPGAs

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Polygon rendering on a stream architecture

HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
A Language for Array and Vector Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
PACS: a parallel microprocessor array for scientific calculations

ACM Transactions on Computer Systems (TOCS)
Speeding up an overrelaxation method of division in Radix-2n machine

Communications of the ACM
An overrelaxation for a numerical inverse of a constant

Communications of the ACM
Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Speculative dynamic vectorization

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatic intra-register vectorization for the Intel architecture

International Journal of Parallel Programming
Automatic Intra-Register Vectorization for the Intel® Architecture

International Journal of Parallel Programming
Imagine: Media Processing with Streams

IEEE Micro
High-Bandwidth Interleaved Memories for Vector Processors - A Simulation Study

IEEE Transactions on Computers
Interrupt Handling for Out-of-Order Execution Processors

IEEE Transactions on Computers
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
Process exchange on the PR1ME family of computers

ACM SIGARCH Computer Architecture News
A proposed high-speed computer design

ACM SIGARCH Computer Architecture News
Numerical Weather Prediction on the Supercomputer Toolkit

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Parallel ray tracing on a chip

Practical parallel rendering
Calisto: A Low-Power Single-Chip Multiprocessor Communications Platform

IEEE Micro
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
VLSI Architecture: Past, Present, and Future

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
The Piecewise Data Flow architecture control flow and register management

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Analysis of Cray-1S architecture

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
(SM)2-Sparse Matrix Solving Machine

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A data flow processor array system: Design and analysis

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Distributed communicating media-a multitrack bus-capable of concurrent data exchanging

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Online pipeline systems for recursive numeric computations

ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
On-line algorithms for the design of pipeline architectures

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
FLATS, a machine for numerical, symbolic and associative computing

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
A pipelined processing unit for on-line division

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
The effects of CPU: I/O overlap on computer system configurations

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Reconfigurable Pipeline Systems

ACM '78 Proceedings of the 1978 annual conference
An extensible architecture for data flow processing

CAW '78 Proceedings of the fourth workshop on Computer architecture for non-numeric processing
Instruction issue logic for pipelined supercomputers

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Supercomputers: Challenges to designers and users

ACM '82 Proceedings of the ACM '82 conference
Instruction reference patterns in data flow programs

ACM '80 Proceedings of the ACM 1980 annual conference
Multiprocessor hardware: An architectural overview

ACM '80 Proceedings of the ACM 1980 annual conference
Multiprocessor software design

ACM '80 Proceedings of the ACM 1980 annual conference
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Memory access reordering in vector processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Programmable Stream Processors

Computer
A method for controlling parallelism in programming languages

ACM SIGPLAN Notices
The Reconfigurable Streaming Vector Processor (RSVPTM)

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Universal Mechanisms for Data-Parallel Architectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Matrix bidiagonalization: implementation and evaluation on the Trident processor

Neural, Parallel & Scientific Computations
An extensible architecture for data flow processing

ACM SIGARCH Computer Architecture News
A vector programming language

ACM SIGPLAN Notices
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Cache Refill/Access Decoupling for Vector Machines

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
The Vector-Thread Architecture

IEEE Micro
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hardware system of the earth simulator

Parallel Computing
Implementing virtual memory in a vector processor with software restart markers

Proceedings of the 20th annual international conference on Supercomputing
The design space of data-parallel memory systems

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Unified microprocessor core storage

Proceedings of the 4th international conference on Computing frontiers
Real memory mach

MSYM'93 Proceedings of the 3rd conference on USENIX MACH III Symposium - Volume 1
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
Design Considerations for Single-Chip Computers of the Future

IEEE Transactions on Computers
The Piecewise Data Flow Architecture: Architectural Concepts

IEEE Transactions on Computers
Supersystems: Current State-of-the-Art Guest Editor's Introduction

IEEE Transactions on Computers
Technology and Design Tradeoffs in the Creation of a Modern Supercomputer

IEEE Transactions on Computers
The Burroughs Scientific Processor (BSP)

IEEE Transactions on Computers
Derivation and Calibration of a Transient Error Reliability Model

IEEE Transactions on Computers
Instruction Issue Logic in Pipelined Supercomputers

IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme

Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
Asymmetrically banked value-aware register files for low-energy and high-performance

Microprocessors & Microsystems
Adaptable pipeline system with dynamic architecture

AFIPS '81 Proceedings of the May 4-7, 1981, national computer conference
Architectures for supersystems of the '80s

AFIPS '80 Proceedings of the May 19-22, 1980, national computer conference
Numerical algorithms for parallel computers

AFIPS '80 Proceedings of the May 19-22, 1980, national computer conference
Applying Data Mapping Techniques to Vector DSPs

Journal of Signal Processing Systems
Flats, a machine for numerical, symbolic and associative computing

IJCAI'79 Proceedings of the 6th international joint conference on Artificial intelligence - Volume 2
The IBM System/370 vector architecture

IBM Systems Journal
A Multi-Shared Register File Structure for VLIW Processors

Journal of Signal Processing Systems
Performance evaluation of vector implementations of combinatorial algorithms

Parallel Computing
Performance evaluation of scientific applications on modern parallel vector systems

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Evaluating the performance of space plasma simulations using FPGA's

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures

Communications of the ACM
An Instruction Fetch Unit for a High-Performance Personal Computer

IEEE Transactions on Computers
VEGAS: soft vector processor with scratchpad memory

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators

Proceedings of the 38th annual international symposium on Computer architecture
Improving GPU performance via large warps and two-level warp scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A compile-time managed multi-level register file hierarchy

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A vector approach to cryptography implementation

DRMTICS'05 Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

ACM Transactions on Computer Systems (TOCS)
CAPRI: prediction of compaction-adequacy for handling control-divergence in GPGPU architectures

Proceedings of the 39th Annual International Symposium on Computer Architecture
Generalizing matrix multiplication for efficient computations on modern computers

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Maximizing SIMD resource utilization in GPGPUs with SIMD lane permutation

Proceedings of the 40th Annual International Symposium on Computer Architecture
Portable, flexible, and scalable soft vector processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators

ACM Transactions on Computer Systems (TOCS)
Hybrid type legalization for a sparse SIMD instruction set

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Parallel Computing
Soft vector processors with streaming pipelines

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays
Embedded supercomputing in FPGAs with the VectorBlox MXP matrix processor

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.13

Visualization

Abstract

This paper describes the CRAY-1, discusses the evolution of its architecture, and gives an account of some of the problems that were overcome during its manufacture.The CRAY-1 is the only computer to have been built to date that satisfies ERDA's Class VI requirement (a computer capable of processing from 20 to 60 million floating point operations per second) [1].The CRAY-1's Fortran compiler (CFT) is designed to give the scientific user immediate access to the benefits of the CRAY-1's vector processing architecture. An optimizing compiler, CFT, “vectorizes” innermost DO loops. Compatible with the ANSI 1966 Fortran Standard and with many commonly supported Fortran extensions, CFT does not require any source program modifications or the use of additional nonstandard Fortran statements to achieve vectorization. Thus the user's investment of hundreds of man months of effort to develop Fortran programs for other contemporary computers is protected.