Low-level vision on warp and the apply programming model
Parallel computation and computers for artificial intelligence
Warp: an integrated solution of high-speed parallel computing
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
High speed networking at Cray research
ACM SIGCOMM Computer Communication Review
Asynchronous transfer mode: solution for broadband ISDN
Asynchronous transfer mode: solution for broadband ISDN
A new approach for automatic parallelization of blocked linear Algebra computations
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Analyzing communication latency using the Nectar communication processor
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
A programmable HIPPI interface for a graphics supercomputer
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Experiments with a gigabit neuroscience application on the CM-2
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Latency and bandwidth considerations in parallel robotics image processing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Design and Evaluation of primitives for Parallel I/O
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
MPI: a message passing interface
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Architecture implications of high-speed I/O for distributed-memory computers
ICS '94 Proceedings of the 8th international conference on Supercomputing
Software support for outboard buffering and checksumming
SIGCOMM '95 Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Distributing a chemical process optimization application over a gigabit network
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Supercomputing with transputers—past, present and future
ICS '90 Proceedings of the 4th international conference on Supercomputing
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Microprocessor file system interfaces
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Task Parallelism in a High Performance Fortran Framework
IEEE Parallel & Distributed Technology: Systems & Technology
Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
TCP/IP on the Parallel Protocol Engine
Proceedings of the IFIP TC6/WG6.4 Fourth International Conference on High Performance Networking IV
A Host Interface Architecture for High-Speed Networks
Proceedings of the IFIP TC6/WG6.4 Fourth International Conference on High Performance Networking IV
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Hi-index | 0.00 |
Distributed-memory systems have traditionally had great difficulty performing network I/O at rates proportional to their computational power. The problem is that the network interface has to support network I/O for a supercomputer, using computational and memory bandwidth resources similar to those of a workstation. As a result, the network interface becomes a bottleneck. In this article we present an I/O architecture that addresses these problems and supports high-speed network I/O on distributed-memory systems. The key to good performance is to partition the work appropriately between the system and the network interface. Some communication tasks are performed on the distributed-memory parallel system, since it is more powerful and less likely to become a bottleneck than the network interface. Tasks that do not parallelize well are performed on the network interface, and hardware support is provided for the most time-critical operations. This architecture has been implemented for the iWarp distributed-memory system and has been used by a number of applications. We describe this implementaiton, present performance results, and use application examples to validated the main features of the I/O architecture.