A processor architecture for horizon

Authors:
M. R. Thistle;B. J. Smith
Affiliations:
Institute for Defense Analyses, Supercomputing Research Center, Lanham, Maryland;Tera Computer Company, P.O. Box 25418, Washington, DC
Venue:
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Year:
1988

Citing 5
Cited 29

Synchronization, Coherence, and Event Ordering in Multiprocessors

Computer
The horizon supercomputing system: architecture and software

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Analysis of a 3D toroidal network for a shared memory architecture

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Compiling on horizon

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers

The horizon supercomputing system: architecture and software

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Performance prediction for the horizon super computer

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
The fast fourier transform and sparse matrix computations: a study of two applications on teh HORIZON supercomputer

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Analysis of multithreaded architectures for parallel computing

SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Instrumentation for a massively parallel MIMD application

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Characterizing memory hot spots in a shared memory MIMD machine

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Complexity results and algorithms for {

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
An elementary processor architecture with simultaneous instruction issuing from multiple threads

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Microarchitecture support for dynamic scheduling of acyclic task graphs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register relocation: flexible contexts for multithreading

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
All-to-All Personalized Communication in a Wormhole-Routed Torus

IEEE Transactions on Parallel and Distributed Systems
Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach

IEEE Transactions on Parallel and Distributed Systems
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Fine-grain multithreading with the EM-X multiprocessor

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
High-Throughput, Low-Memory Applications on the Pica Architecture

IEEE Transactions on Parallel and Distributed Systems
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Asynchrony in parallel computing: from dataflow to multithreading

Progress in computer research
Performance Tradeoffs in Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
A survey of processors with explicit multithreading

ACM Computing Surveys (CSUR)
Analytic Performance Modeling for a Spectrum of Multithreaded Processor Architectures

MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
Extended Split-Issue: Enabling Flexibility in the Hardware Implementation of NUAL VLIW DSPs

Proceedings of the 31st annual international symposium on Computer architecture
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware

ACM Transactions on Architecture and Code Optimization (TACO)
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts

Proceedings of the 8th ACM International Conference on Computing Frontiers
Exploring irregular memory accesses on FPGAs

Proceedings of the first workshop on Irregular applications: architectures and algorithm
Static partitioning vs dynamic sharing of resources in simultaneous multithreading microarchitectures

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Scheduling independent jobs for torus connected networks with/without link contention

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Horizon is a scalable shared-memory Multiple Instruction stream - Multiple Data stream (MIMD) computer architecture independently under study at the Supercomputing Research Center (SRC) and Tera Computer Company. It is composed of a few hundred identical scalar processors and a comparable number of memories, sparsely embedded in a three-dimensional nearest-neighbor network. Each processor has a horizontal instruction set that can issue up to three floating point operations per cycle without resorting to vector operations. Processors will each be capable of performing several hundred Million Floating Point Operations Per Second (FLOPS) in order to achieve an overall system performance target of 100 Billion (1011) FLOPS.This paper describes the architecture of the processor in the Horizon system. In the fashion of the Denelcor HEP, the processor maintains a variable number of Single Instruction stream - Single Data stream (SISD) processes, which are called instruction streams. Memory latency introduced by the large shared memory is hidden by switching context (instruction stream) each machine cycle. The processor functional units are pipelined to achieve high computational throughput rates; however, pipeline dependencies are hidden from user code. Hardware mechanisms manage the resources to guarantee anonymity and independence of instruction streams.