Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
IEEE Transactions on Computers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The iPSC/2 direct-connect communications technology
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
The NCUBE family of high-performance parallel computer systems
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
A high-performance, memory-based interconnection system for multicomputer environments
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A message passing coprocessor for distributed memory multicomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ICS '90 Proceedings of the 4th international conference on Supercomputing
Supporting systolic and memory communication in iWarp
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
PLUS: a distributed shared-memory system
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Performing remote operations efficiently on a local computer network
Communications of the ACM
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Separating data and control transfer in distributed operating systems
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Software overhead in messaging layers: where does the time go?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The operating system kernel as a secure programmable machine
ACM SIGOPS Operating Systems Review
Synchronization for a multi-port frame buffer on a mesh-connected multicomputer
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Understanding application performance on shared virtual memory systems
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Operating system support for high-speed communication
Communications of the ACM
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Scope consistency: a bridge between release consistency and entry consistency
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Fine grain parallel communication on general purpose LANs
ICS '96 Proceedings of the 10th international conference on Supercomputing
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
High-performance sorting on networks of workstations
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Triplex: a multi-class routing algorithm
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
pSNOW: a tool to evaluate architectural issues for NOW environments
ICS '97 Proceedings of the 11th international conference on Supercomputing
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
VM-based shared memory on low-latency, remote-memory-access networks
Proceedings of the 24th annual international symposium on Computer architecture
Per-Node Multithreading and Remote Latency
IEEE Transactions on Computers
Monitoring shared virtual memory performance on a Myrinet-based PC cluster
ICS '98 Proceedings of the 12th international conference on Supercomputing
Evaluation of hardware write propagation support for next-generation shared virtual memory clusters
ICS '98 Proceedings of the 12th international conference on Supercomputing
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
Adapting the Network Interface for High-Performance Computing: The CNI Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Performance monitoring in a Myrinet-connected SHRIMP cluster
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer
25 years of the international symposia on Computer architecture (selected papers)
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Design challenges of virtual networks: fast, general-purpose communication
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Responsiveness without interrupts
ICS '99 Proceedings of the 13th international conference on Supercomputing
Shared virtual memory with automatic update support
ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic remote memory acquisition for parallel data mining on ATM-connected PC cluster
ICS '99 Proceedings of the 13th international conference on Supercomputing
Fast cluster failover using virtual memory-mapped communication
ICS '99 Proceedings of the 13th international conference on Supercomputing
High-Performance Routing in Networks of Workstations with Irregular Topology
IEEE Transactions on Parallel and Distributed Systems
Accelerating shared virtual memory via general-purpose network interface support
ACM Transactions on Computer Systems (TOCS)
Optimistic active messages: structuring systems for high-performance communication
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
The operating system kernel as a secure programmable machine
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
StarT-Voyager: a flexible platform for exploring scalable SMP issues
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Communication overlap in multi-tier parallel algorithms
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The effects of communication parameters on end performance of shared virtual memory clusters
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A system software architecture for high-end computing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
The Network RamDisk: Using remote memory on heterogeneous NOWs
Cluster Computing
On using network RAM as a non-volatile buffer
Cluster Computing
An Overview of Reflective Memory Systems
IEEE Concurrency
Building and Using A Scalable Display Wall System
IEEE Computer Graphics and Applications
Virtual-Memory-Mapped Network Interfaces
IEEE Micro
Client-Server Computing on Shrimp
IEEE Micro
Shrimp Project Update: Myrinet Communication
IEEE Micro
The Virtual Interface Architecture
IEEE Micro
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems
IEEE Transactions on Parallel and Distributed Systems
Application-Controlled Coherence Protocols for Scope Consistent Software DSMs
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Experience with Parallel Computing on the AN2 Network
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Design and Implementation of Virtual Memory-Mapped Communication on Myrinet
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reducing Waiting Costs in User-Level Communication
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
ClusterNet: An Object-Oriented Cluster Network
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Structure and Performance of the Direct Access File System
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
An MPI Implementation on the Top of the Virtual Interface Architecture
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
miNI: reducing network interface memory requirements with dynamic handle lookup
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Protected, user-level DMA for the SHRIMP network interface
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Using memory-mapped network interfaces to improve the performance of distributed shared memory
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Telegraphos: High-Performance Networking for Parallel Processing on Workstation Clusters
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Improving Release-Consistent Shared Virtual Memory using Automatic Update
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
CNI: A High-Performance Network Interface for Workstation Clusters
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Design Study of Alternative Network Topologies for the Beowulf Parallel Workstation
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Using Remote Memory to avoid Disk Thrashing: A Simulation Study
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Using model checking to debug device firmware
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Algorithm-Based Diskless Checkpointing for Fault-Tolerant Matrix Operations
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
Engineering a user-level TCP for the CLAN network
NICELI '03 Proceedings of the ACM SIGCOMM workshop on Network-I/O convergence: experience, lessons, implications
Journal of Parallel and Distributed Computing
PRESS: A Clustered Server Based on User-Level Communication
IEEE Transactions on Parallel and Distributed Systems
Using model checking to debug device firmware
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Lazy direct-to-cache transfer during receive operations in a message passing environment
Proceedings of the 3rd conference on Computing frontiers
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Efficient remote block-level I/O over an RDMA-capable NIC
Proceedings of the 20th annual international conference on Supercomputing
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Design and implementation of a direct access file system (DAFS) kernel server for FreeBSD
BSDC'02 Proceedings of the BSD Conference 2002 on BSD Conference
WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Implementation of a reliable remote memory pager
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
High-performance distributed objects over system area networks
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Porting a user-level communication architecture to NT: experiences and performance
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
High-performance local area communication with fast sockets
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
Research works on cluster computing and storage area network
Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
DASH: a Recipe for a Flash-based Data Intensive Supercomputer
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
TCP/IP offload engine module supporting binary compatibility for standard socket interfaces
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Chronos: predictable low latency for data center applications
Proceedings of the Third ACM Symposium on Cloud Computing
RDMA in the SiCortex cluster systems
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Unified high-performance I/O: one stack to rule them all
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
jVerbs: ultra-low latency for data center applications
Proceedings of the 4th annual Symposium on Cloud Computing
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.03 |
The network interfaces of existing multicomputers require a significant amount of software overhead to provide protection and to implement message passing protocols. This paper describes the design of a low-latency, high-bandwidth, virtual memory-mapped network interface for the SHRIMP multicomputer project at Princeton University. Without sacrificing protection, the network interface achieves low latency by using virtual memory mapping and write-latency hiding techniques, and obtains high bandwidth by providing a user-level block data transfer mechanism. We have implemented several message passing primitives in an experimental environment, demonstrating that our approach can reduce the message passing overhead to a few user-level instructions.