Computational geometry: an introduction
Computational geometry: an introduction
Compilation for a high-performance systolic array
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Warp architecture and implementation
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Low-level vision on warp and the apply programming model
Parallel computation and computers for artificial intelligence
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A systolic array optimizing compiler
A systolic array optimizing compiler
Design of a Massively Parallel Processor
IEEE Transactions on Computers
Computer
First results in robot road-following
IJCAI'85 Proceedings of the 9th international joint conference on Artificial intelligence - Volume 2
Deadlock avoidance for systolic communication
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
An integrated environment for development and execution of real-time programs
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Program development for a systolic array
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Crystal: from functional description to efficient parallel code
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The design of nectar: a network backplane for heterogeneous multicomputers
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A program debugger for a systolic array: design and implementation
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Compiler optimizations for asynchronous systolic array programs
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GENCRAY: a portable code generator for cray fortran
ISSAC '89 Proceedings of the ACM-SIGSAM 1989 international symposium on Symbolic and algebraic computation
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Communication in iWarp systems
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
K9: a simulator of distributed-memory parallel processors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Compiling programs for a linear systolic array
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A linear array of processors with partially shared memory for parallel solution of PDE
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Improved Algorithms for Mapping Pipelined and Parallel Computations
IEEE Transactions on Computers
IEEE Transactions on Pattern Analysis and Machine Intelligence
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The K2 distributed memory parallel processor: architecture, compiler, and operating system
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Task-Flow Architecture for WSI Parallel Processing
Computer - Special issue on wafer-scale integration
Parallel Architectures and Algorithms for Image Component Labeling
IEEE Transactions on Pattern Analysis and Machine Intelligence
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Achieving super computer performance with a DSP array processor
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Model-driven mapping onto distributed memory parallel computers
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
Designing a Scalable Processor Array for Recurrent Computations
IEEE Transactions on Parallel and Distributed Systems
Retrospective: a retrospective on the Warp machines
25 years of the international symposia on Computer architecture (selected papers)
Kestrel: A Programmable Array for Sequence Analysis
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Overview of a high-performance programmable pipeline structure
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The K2 parallel processor: architecture and hardware implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
VLSI Architectures for Neural Networks
IEEE Micro
Uniform Approach for Solving some Classical Problems on a Linear Array
IEEE Transactions on Parallel and Distributed Systems
Cost and Time-Cost Effectiveness of Multiprocessing
IEEE Transactions on Parallel and Distributed Systems
NETRA: A Hierarchical and Partitionable Architecture for Computer Vision Systems
IEEE Transactions on Parallel and Distributed Systems
Parallel Processing in the DARPA Strategic Computing Vision Program
IEEE Expert: Intelligent Systems and Their Applications
Performance of On-Chip Multiprocessors for Vision Tasks
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Configurable computing: the catalyst for high-performance architectures
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Software pipelining: an effective scheduling technique for VLIW machines
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Towards systolizing compilation
Distributed Computing
Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication
Scientific Programming - Parallel Computing Projects of the Swiss Priority Programme
Finding the Next Computational Model: Experience with the UCSC Kestrel
Journal of Signal Processing Systems
Vision-based road detection in automotive systems: a real-time expectation-driven approach
Journal of Artificial Intelligence Research
Neurocomputing
Paper: Program compression on the instruction systolic array
Parallel Computing
Artificial Intelligence Review
Hi-index | 14.99 |
The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.