The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Authors:
B. Ramakrishna Rau;David W. L. Yen;Wei Yen;Ross A. Towie
Affiliations:
Cydrome, Inc., Los Gatos, CA;Sun Microsystems;Cydrome, Inc., Los Gatos, CA;Apogee Software
Venue:
Computer
Year:
1989

Citing 6
Cited 110

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Structure of Computers and Computations

Structure of Computers and Computations
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Efficient code generation for horizontal architectures: Compiler techniques and architectural support

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
A methodology for programming a pipeline array processor

MICRO 11 Proceedings of the 11th annual workshop on Microprogramming

Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Functional languages in microcode compilers

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
A preceding activation scheme with graph unfolding for the parallel processing system-array

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A variable instruction stream extension to the VLIW architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A timed Petri-net model for fine-grain loop scheduling

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Pseudo-randomly interleaved memory

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An instruction-level performance analysis of the Multiflow TRACE 14/300

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Software pipelining for transport-triggered architectures

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
MOVE: a framework for high-performance processor design

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Concurrency Extraction Via Hardware Methods Executing the Static Instruction Stream

IEEE Transactions on Computers
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register requirements of pipelined processors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Software support for speculative loads

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A VLIW architecture for optimal execution of branch-intensive loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting instruction-level parallelism: the multithreaded approach

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
StaCS: a Static Control Superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Software pipelining

ACM Computing Surveys (CSUR)
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Proceedings of the 28th annual international symposium on Microarchitecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analysis techniques for predicated code

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Global predicate analysis and its application to register allocation

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
A software pipelining based VLIW architecture and optimizing compiler

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A VLIW architecture based on shifting register files

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
The program decision logic approach to predicated execution

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Vector register design for polycyclic vector scheduling

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Using profiling to reduce branch misprediction costs on a dynamically scheduled processor

Proceedings of the 14th international conference on Supercomputing
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
Compiler-Assisted Multiple Instruction Word Retry for VLIW Architectures

IEEE Transactions on Parallel and Distributed Systems
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A code decompression architecture for VLIW processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo schedule buffers

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Efficient static single assignment form for predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

International Journal of Parallel Programming
Guest Editor's Introduction Real Machines: Design Choices/Engineering Trade-Offs

Computer
Branch Effect Reduction Techniques

Computer
EPIC: Explicitly Parallel Instruction Computing

Computer
Introducing the IA-64 Architecture

IEEE Micro
Itanium Processor Microarchitecture

IEEE Micro
Reducing Interference Among Vector Accesses in Interleaved Memories

IEEE Transactions on Computers
Requirements for Optimal Execution of Loops with Tests

IEEE Transactions on Parallel and Distributed Systems
Pipelining and Bypassing in a VLIW Processor

IEEE Transactions on Parallel and Distributed Systems
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
Pseudo-vectorizing Compiler for the SR8000 (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Informationstechnik in der Lebenswelt

Informatik und Schule 1991, Informatik: Wege zur Vielfalt beim Lehren und Lernen
Static Analysis for Guarded Code

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Phi-Predication for light-weight if-conversion

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Instruction-level parallel processors-dynamic and static scheduling tradeoffs

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Selective Guarded Execution Using Profiling on a Dynamically Scheduled Processor

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Analyzing the Individual/Combined Effects of Speculative and Guarded Execution on a Superscalar Architecture

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A timed Petri-net model for fine-grain loop scheduling

CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Register allocation for optimal loop scheduling

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
A Distributed Control Path Architecture for VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Wish Branches: Enabling Adaptive and Aggressive Predicated Execution

IEEE Micro
Dataflow Predication

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ALP: Efficient support for all levels of parallelism for complex media applications

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing code size in VLIW instruction scheduling

Journal of Embedded Computing - Low-power Embedded Systems
An Analytical Approach to Scheduling Code for Superscalar and VLIW Architectures

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
VLIW-DLX simulator for educational purposes

WCAE '07 Proceedings of the 2007 workshop on Computer architecture education
Facilitating compiler optimizations through the dynamic mapping of alternate register structures

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A time-predictable VLIW processor and its compiler support

Real-Time Systems
Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Merged Dictionary Code Compression for FPGA Implementation of Custom Microcoded PEs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Processor Description Languages

Processor Description Languages
Code compression for embedded VLIW processors using variable-to-fixed coding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
RIMP: runtime implicit predication

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies

Quantified Score

Hi-index	4.11

Visualization

Abstract

The Cydra 5 is a heterogeneous multiprocessor system that targets small work groups or departments of scientists and engineers. The two types of processors are functionally specialized for the different components of the work load found in a departmental setting. The Cydra 5 numeric processor, based on a directed-data-flow architecture, provides consistently high performance on a broader class of numerical computations. The interactive processors offload all nonnumeric work from the numeric processor, leaving it free to spend all its time on the numeric application. The I/O processors permit high-bandwidth I/O transitions with minimal involvement from the interactive or numeric processors. The system architecture and data-flow architecture are described. The numeric processor decisions and tradeoffs are examined, and the main memory system is discussed. Some reflections on the design issues are offered.