Compilation for a high-performance systolic array
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
The warp computer: Architecture, implementation, and performance
IEEE Transactions on Computers
Low-level vision on warp and the apply programming model
Parallel computation and computers for artificial intelligence
Deadlock avoidance for systolic communication
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
A VLSI Architecture for Concurrent Data Structures
A VLSI Architecture for Concurrent Data Structures
Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Interprocessor communication speed and performance in distributed-memory parallel processors
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Communication in iWarp systems
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
K9: a simulator of distributed-memory parallel processors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
FLIP-FLOP: a stack-oriented multiprocessing system
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Building and Using a Highly Parallel Programmable Logic Array
Computer - Special issue on experimental research in computer architecture
Software and hardware parallelism on the iWarp multi-computer
ICS '91 Proceedings of the 5th international conference on Supercomputing
Parallelizing a new class of large applications over high-speed networks
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
FLIP-FLOP: a stack-oriented multiprocessing system
ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Network-based multicomputers: an emerging parallel architecture
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
MOVE: a framework for high-performance processor design
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
FLIP-FLOP: a stack-oriented multiprocessing system
ACM SIGFORTH Newsletter - Special issue: Hardware
The K2 distributed memory parallel processor: architecture, compiler, and operating system
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
The turn model for adaptive routing
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Adaptive deadlock- and livelock-free routing with all minimal paths in Torus networks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Supporting the hypercube programming model on mesh architectures: (a fast sorter for iWarp tori)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Subset barrier synchronization on a private-memory parallel system
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Detection and recovery of endangered variables caused by instruction scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Exploiting task and data parallelism on a multicomputer
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling task and data parallel programs for iWarp
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
A comparison of adaptive wormhole routing algorithms
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Supporting sets of arbitrary connections on iWarp through communication context switches
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Processor autonomy on SIMD architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Latency and bandwidth considerations in parallel robotics image processing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Compiling nested data-parallel programs for shared-memory multiprocessors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Fault-tolerant wormhole routing in tori
ICS '94 Proceedings of the 8th international conference on Supercomputing
Architecture implications of high-speed I/O for distributed-memory computers
ICS '94 Proceedings of the 8th international conference on Supercomputing
An architecture for optimal all-to-all personalized communication
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
The turn model for adaptive routing
Journal of the ACM (JACM)
Adaptive Deadlock- and Livelock-Free Routing in the Hypercube Network
IEEE Transactions on Parallel and Distributed Systems
Adaptive Deadlock- and Livelock-Free Routing with All Minimal Paths in Torus Networks
IEEE Transactions on Parallel and Distributed Systems
Unicast-Based Multicast Communication in Wormhole-Routed Networks
IEEE Transactions on Parallel and Distributed Systems
Architecture and evaluation of a high-speed networking subsystem for distributed-memory systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Efficient Algorithms for a Class of Partitioning Problems
IEEE Transactions on Parallel and Distributed Systems
ROMM routing on mesh and torus networks
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Flexible oblivious router architecture
IBM Journal of Research and Development
Distributing a chemical process optimization application over a gigabit network
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Gigabit I/O for distributed-memory machines: architecture and applications
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Circuit-Switched Broadcasting in Torus Networks
IEEE Transactions on Parallel and Distributed Systems
A Framework for Designing Deadlock-Free Wormhole Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
On the benefit of supporting virtual channels in wormhole routers
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Network-Based Multicomputers: A Practical Supercomputer Architecture
IEEE Transactions on Parallel and Distributed Systems
A high-speed network interface for distributed-memory systems: architecture and applications
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Parallelization of FORTRAN code on distributed-memory parallel processors
ICS '90 Proceedings of the 4th international conference on Supercomputing
Determining the Order of Processor Transactions in StaticallyScheduled Multiprocessors
Journal of VLSI Signal Processing Systems
A Cost and Speed Model for k-ary n-Cube Wormhole Routers
IEEE Transactions on Parallel and Distributed Systems
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
The turn model for adaptive routing
25 years of the international symposia on Computer architecture (selected papers)
Cyclic-Cubes: A New Family of Interconnection Networks of Even Fixed-Degrees
IEEE Transactions on Parallel and Distributed Systems
Fault-Tolerant Communication with Partitioned Dimension-Order Routers
IEEE Transactions on Parallel and Distributed Systems
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The K2 parallel processor: architecture and hardware implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PiSMA: a parallel VSM architecture
Crossroads
Fault-tolerant routing with non-adaptive wormhole algorithms in mesh networks
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simplifying Connection-Based Communication
IEEE Parallel & Distributed Technology: Systems & Technology
General-Purpose Systolic Arrays
Computer
Hypercube Communication Delay with Wormhole Routing
IEEE Transactions on Computers
Fault-Tolerant Wormhole Routing Algorithms for Mesh Networks
IEEE Transactions on Computers
Lee Distance and Topological Properties of k-ary n-cubes
IEEE Transactions on Computers
Valved Routing: Efficient Flow Control for Adaptive Nonminimal Routing in Interconnection Networks
IEEE Transactions on Computers
Limits on Interconnection Network Performance
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels
IEEE Transactions on Parallel and Distributed Systems
A Network Flow Model for Load Balancing in Circuit-Switched Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Multicast Wormhole Routing in 2-D Mesh Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks
IEEE Transactions on Parallel and Distributed Systems
Parallel Processing in the DARPA Strategic Computing Vision Program
IEEE Expert: Intelligent Systems and Their Applications
Deadlock- and Livelock-Free Routing Protocols for Wave Switching
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Modeling Instruction-Level Parallelism for Software Pipelining
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
A Parallel Algorithm for Lagrange Interpolation on k-ary n-Cubes
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Fault-Tolerance with Multimodule Routers
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
An overview of embedded system design education at berkeley
ACM Transactions on Embedded Computing Systems (TECS)
Embedded system education: a new paradigm for engineering schools?
ACM SIGBED Review - Special issue: The first workshop on embedded system education (WESE)
A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Routing table minimization for irregular mesh NoCs
Proceedings of the conference on Design, automation and test in Europe
An Efficient Implementation of Distributed Routing Algorithms for NoCs
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
rMPI: message passing on multicore processors with on-chip interconnect
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Future of interconnect fabric: a contrarian view
Proceedings of the 12th ACM/IEEE international workshop on System level interconnect prediction
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
Embedding of tori and grids into twisted cubes
Theoretical Computer Science
A dynamic programming algorithm for simulation of a multi-dimensional torus in a crossed cube
Information Sciences: an International Journal
Information Sciences: an International Journal
Design and implementation of an ordered memory access architecture
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
One-to-one disjoint path covers on k-ary n-cubes
Theoretical Computer Science
Scheduling independent jobs for torus connected networks with/without link contention
Mathematical and Computer Modelling: An International Journal
An efficient, low-cost routing framework for convex mesh partitions to support virtualization
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Hi-index | 0.02 |
iWarp is a system architecture for high speed signal, image and scientific computing. The heart of an iWarp system is the iWarp component: a single chip processor that requires only the addition of memory chips to form a complete system building block, called the iWarp cell. Each iWarp component contains both a powerful computation engine (20 MFLOPS) and a high throughput (320 MBytes/sec), low latency (100-150 ns) communication engine for interfacing with other iWarp cells. Because of its strong computation and communication capabilities, the iWarp component is a versatile building block for various high performance parallel systems. These systems range from special purpose systolic arrays to general purpose distributed memory computers. They are able to support both fine-grain parallel and coarse-grain distributed computation models simultaneously in the same system. An iWarp system can include a large number of cells; the initial iWarp demonstration system consists of an 8x8 torus of iWarp cells, delivering more than 1.2 GFLOPS. It can be expanded to include up to 1,024 cells. This paper describes the iWarp architecture and how it supports various communication models and system configurations.