The warp computer: Architecture, implementation, and performance

Authors:
M. Annaratone;E. Arnould;T. Gross;H. T. Kung;M. Lam
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 11
Cited 58

Computational geometry: an introduction

Computational geometry: an introduction
Compilation for a high-performance systolic array

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Warp architecture and implementation

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Applications of the Connection Machine

Computer
Low-level vision on warp and the apply programming model

Parallel computation and computers for artificial intelligence
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A systolic array optimizing compiler

A systolic array optimizing compiler
Design of a Massively Parallel Processor

IEEE Transactions on Computers
Why Systolic Architectures?

Computer
An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family

Computer
First results in robot road-following

IJCAI'85 Proceedings of the 9th international joint conference on Artificial intelligence - Volume 2

Deadlock avoidance for systolic communication

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
An integrated environment for development and execution of real-time programs

ICS '88 Proceedings of the 2nd international conference on Supercomputing
ICAP/3090: parallel processing for large-scale scientific and engineering problems

IBM Systems Journal
Program development for a systolic array

PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Crystal: from functional description to efficient parallel code

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Warp: an integrated solution of high-speed parallel computing

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The design of nectar: a network backplane for heterogeneous multicomputers

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A program debugger for a systolic array: design and implementation

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Compiler optimizations for asynchronous systolic array programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GENCRAY: a portable code generator for cray fortran

ISSAC '89 Proceedings of the ACM-SIGSAM 1989 international symposium on Symbolic and algebraic computation
Interprocessor communication speed and performance in distributed-memory parallel processors

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Communication in iWarp systems

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
K9: a simulator of distributed-memory parallel processors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Compiling programs for a linear systolic array

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A linear array of processors with partially shared memory for parallel solution of PDE

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Experimental analysis of communication/data-conditional aspects of a mixed-mode parallel architecture via synthetic computations

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Improved Algorithms for Mapping Pipelined and Parallel Computations

IEEE Transactions on Computers
The Warp Machine on Navlab

IEEE Transactions on Pattern Analysis and Machine Intelligence
A new approach for automatic parallelization of blocked linear Algebra computations

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The K2 distributed memory parallel processor: architecture, compiler, and operating system

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Steps Toward Architecture-Independent Image Processing

Computer
Task-Flow Architecture for WSI Parallel Processing

Computer - Special issue on wafer-scale integration
Parallel Architectures and Algorithms for Image Component Labeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Achieving super computer performance with a DSP array processor

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Model-driven mapping onto distributed memory parallel computers

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Network-Based Multicomputers: A Practical Supercomputer Architecture

IEEE Transactions on Parallel and Distributed Systems
Designing a Scalable Processor Array for Recurrent Computations

IEEE Transactions on Parallel and Distributed Systems
Retrospective: a retrospective on the Warp machines

25 years of the international symposia on Computer architecture (selected papers)
Kestrel: A Programmable Array for Sequence Analysis

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Overview of a high-performance programmable pipeline structure

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The K2 parallel processor: architecture and hardware implementation

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
VLSI Architectures for Neural Networks

IEEE Micro
Datawave: A Single-Chip Multiprocessor for Video Applications

IEEE Micro
Achieving Supercomputer Performance for Neural Net Simulation with an Array of Digital Signal Processors

IEEE Micro
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Uniform Approach for Solving some Classical Problems on a Linear Array

IEEE Transactions on Parallel and Distributed Systems
Cost and Time-Cost Effectiveness of Multiprocessing

IEEE Transactions on Parallel and Distributed Systems
NETRA: A Hierarchical and Partitionable Architecture for Computer Vision Systems

IEEE Transactions on Parallel and Distributed Systems
Parallel Processing in the DARPA Strategic Computing Vision Program

IEEE Expert: Intelligent Systems and Their Applications
A new FPGA/DSP-based parallel architecture for real-time image processing

Real-Time Imaging
Performance of On-Chip Multiprocessors for Vision Tasks

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Configurable computing: the catalyst for high-performance architectures

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
Towards systolizing compilation

Distributed Computing
Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

Scientific Programming - Parallel Computing Projects of the Swiss Priority Programme
Finding the Next Computational Model: Experience with the UCSC Kestrel

Journal of Signal Processing Systems
Vision-based road detection in automotive systems: a real-time expectation-driven approach

Journal of Artificial Intelligence Research
Paper: Neurocomputers

Neurocomputing
Paper: Program compression on the instruction systolic array

Parallel Computing
Research: Performance of the node/link behaviour of multicomputer networks with input buffer limiting and channel contention

Computer Communications
A new classification approach for neural networks hardware: from standards chips to embedded systems on chip

Artificial Intelligence Review

Quantified Score

Hi-index	14.99

Visualization

Abstract

The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.