The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
Low-level vision on warp and the apply programming model
Parallel computation and computers for artificial intelligence
Deadlock avoidance for systolic communication
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The architecture and programming of the Ametek series 2010 multicomputer
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
The design of nectar: a network backplane for heterogeneous multicomputers
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
An architecture independent programming language for low-level vision
Computer Vision, Graphics, and Image Processing
A parallelizing compiler for distributed memory parallel computers
A parallelizing compiler for distributed memory parallel computers
Communication in iWarp systems
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Network-based multicomputers: redefining high performance computing in the 1990s
Proceedings of the decennial Caltech conference on VLSI on Advanced research in VLSI
A VLSI Architecture for Concurrent Data Structures
A VLSI Architecture for Concurrent Data Structures
A systolic array optimizing compiler
A systolic array optimizing compiler
Efficient Doacross execution on distributed shared-memory multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Parallelizing a new class of large applications over high-speed networks
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Source level debugging of automatically parallelized code
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Network-based multicomputers: an emerging parallel architecture
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Low-latency message communication support for the AP1000
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Supporting the hypercube programming model on mesh architectures: (a fast sorter for iWarp tori)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Subset barrier synchronization on a private-memory parallel system
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
The IBM Victor V256 partitionable multiprocessor
IBM Journal of Research and Development
Evaluation of compiler generated parallel programs on three multicomputers
ICS '92 Proceedings of the 6th international conference on Supercomputing
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Exploiting task and data parallelism on a multicomputer
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Supporting sets of arbitrary connections on iWarp through communication context switches
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
The NuMesh: a modular, scalable communications substrate
ICS '93 Proceedings of the 7th international conference on Supercomputing
T: integrated building blocks for parallel computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Architecture implications of high-speed I/O for distributed-memory computers
ICS '94 Proceedings of the 8th international conference on Supercomputing
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Segment router: a novel router design for parallel computers
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
Virtual memory mapped network interface for the SHRIMP multicomputer
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A Family of Fault-Tolerant Routing Protocols for Direct Multiprocessor Networks
IEEE Transactions on Parallel and Distributed Systems
Gigabit I/O for distributed-memory machines: architecture and applications
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Analysis and implementation of hybrid switching
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
Distributed, Deadlock-Free Routing in Faulty, Pipelined, Direct Interconnection Networks
IEEE Transactions on Computers
Analysis and Implementation of Hybrid Switching
IEEE Transactions on Computers
On Bufferless Routing of Variable Length Messages in Leveled Networks
IEEE Transactions on Computers
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Early experience with message-passing on the SHRIMP multicomputer
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
A high-speed network interface for distributed-memory systems: architecture and applications
ACM Transactions on Computer Systems (TOCS)
Compressionless Routing: A Framework for Adaptive and Fault-Tolerant Routing
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Buffering Schemes in Wormhole Routers
IEEE Transactions on Computers
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
A multiprocessor DSP system using PADDI-2
DAC '98 Proceedings of the 35th annual Design Automation Conference
Virtual memory mapped network interface for the SHRIMP multicomputer
25 years of the international symposia on Computer architecture (selected papers)
Flexible and Efficient Routing Based on Progressive Deadlock Recovery
IEEE Transactions on Computers
Wormhole IP over (connectionless) ATM
IEEE/ACM Transactions on Networking (TON)
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Task Parallelism in a High Performance Fortran Framework
IEEE Parallel & Distributed Technology: Systems & Technology
Simplifying Connection-Based Communication
IEEE Parallel & Distributed Technology: Systems & Technology
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs
IEEE Parallel & Distributed Technology: Systems & Technology
Virtual-Memory-Mapped Network Interfaces
IEEE Micro
IEEE Transactions on Parallel and Distributed Systems
HARP: An Open Architecture for Parallel Matrix and Signal Processing
IEEE Transactions on Parallel and Distributed Systems
NETRA: A Hierarchical and Partitionable Architecture for Computer Vision Systems
IEEE Transactions on Parallel and Distributed Systems
The Impact of Pipelined Channels on k-ary n-Cube Networks
IEEE Transactions on Parallel and Distributed Systems
Modeling Instruction-Level Parallelism for Software Pipelining
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Configurable computing: the catalyst for high-performance architectures
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow-Control
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Modeling virtual channel flow control in hypercubes
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
An Efficient, Low-Cost I/O Subsystem for Network Processors
IEEE Design & Test
An architecture and compiler for scalable on-chip communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Deadlock-free connection-based adaptive routing with dynamic virtual circuits
Journal of Parallel and Distributed Computing
The Journal of Supercomputing
Continuum: A Hybrid Time/Space Communications Paradigm for k-ary n-cubes
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Massively parallel artificial intelligence
IJCAI'91 Proceedings of the 12th international joint conference on Artificial intelligence - Volume 1
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hardware support for multithreaded execution of loops with limited parallelism
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Hi-index | 0.02 |
iWarp is a parallel architecture developed jointly by Carnegie Mellon University and Intel Corporation. The iWarp communication system supports two widely used interprocessor communication styles: memory communication and systolic communication. This paper describes the rationale, architecture, and implementation for the iWarp communication system.The sending or receiving processor of a message can perform either memory or systolic communication. In memory communication, the entire message is buffered in the local memory of the processor before it is transmitted or after it is received. Therefore communication begins or terminates at the local memory. For conventional message passing methods, both sending and receiving processors use memory communication. In systolic communication, individual data items are transferred as they are produced, or are used as they are received, by the program running at the processor. Memory communication is flexible and well suited for general computing; whereas systolic communication is efficient and well suited for speed critical applications.A major achievement of the iWarp effort is the derivation of a common design to satisfy the requirements of both systolic and memory communication styles. This is made possible by two important innovations in communication: (1) program access to communication and (2) logical channels. The former allows programs to access data as they are transmitted and to redirect portions of messages to different destinations efficiently. The latter increases the connectivity between the processors and guarantees communication bandwidth for classes of messages. These innovations have provided a focus for the iWarp architecture. The result is a communication system that provides a total bandwidth of 320 MBytes/sec and that is integrated on a single VLSI component with a 20 MFLOPS plus 20 MIPS long instruction word computation engine.