Improving the Throughput of Remote Storage Access through Pipelining
GRID '02 Proceedings of the Third International Workshop on Grid Computing
Optimal Multicast with Packetization and Network Interface Support
ICPP '97 Proceedings of the international Conference on Parallel Processing
Using Programmable NICs for Time-Warp Optimization
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Incorporating Quality-of-Service in the Virtual Interface Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploiting the Capabilities of Communications Co-Processors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Platform-Independent Runtime Optimizations Using OpenThreads
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
CMC: A Coscheduling Model for non-Dedicated Cluster Computing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Priority Based Messaging for Software Distributed Shared Memory
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
An Efficient and Scalable Coscheduling Technique for Large Symmetric Multiprocessor Clusters
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
VIA Communication Performance on a Gigabit Ethernet Cluster
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Converse: An Interoperable Framework for Parallel Programming
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Asynchronous MPI messaging on Myrinet
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Integrating polling, interrupts, and thread management
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Heterogeneous Distributed Virtual Machines in the Harness Metacomputing Framework
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Shared Memory NUMA Programming on I-WAY
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Automatic exploitation of dual level parallelism on a network of multiprocessors
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
A Network Co-processor-Based Approach to Scalable Media Streaming in Servers
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Optimizing Parallel Applications for Wide-Area Clusters
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
International Journal of High Performance Computing Applications
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
Journal of Parallel and Distributed Computing
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part II
Implementation and performance study of a hardware-VIA-based network adapter on gigabit ethernet
Journal of Systems Architecture: the EUROMICRO Journal
MMR: A MultiMedia Router architecture to support hybrid workloads
Journal of Parallel and Distributed Computing
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Porting a user-level communication architecture to NT: experiences and performance
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
IBM Journal of Research and Development
The ParaStation project: Using workstations as building blocks for parallel computing
Information Sciences: an International Journal
ToCL: a thread oriented communication library to interface VIA and GM protocols
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartII
NetSlices: scalable multi-core packet processing in user-space
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In most computer systems, software overhead dominates the cost ofmessaging, reducing delivered performance, especially for shortmessages. Efficient software messaging layers are needed to deliverthe hardware performance to the application level and to supporttightly-coupled workstation clusters. Illinois Fast Messages (FM)1.0 is a high speed messaging layer that delivers low latency andhigh bandwidth for short messages. For 128-byte packets, FMachieves bandwidths of 16.2MB/s and one-way latencies 32 µson Myrinet-connected SPARCstations (user-level to user-level). Forshorter packets, we have measured one-way latencies of 25 µs,and for larger packets, bandwidth as high as to 19.6MB/s -delivered bandwidth greater than OC-3. FM is also superior to theMyrinet API messaging layer, not just in terms of latency andusable bandwidth, but also in terms of the message half-power point(n_{\frac{1}{2}}), which is two orders of magnitude smaller (54 vs.4,409 bytes). We describe the FM messaging primitives and thecritical design issues in building a low-latency messaging layersfor workstation clusters. Several issues are critical: the divisionof labor between host and network coprocessor, management of theinput/output (I/O) bus, and buffer management. To achieve highperformance, messaging layers should assign as much functionalityas possible to the host. If the network interface has DMAcapability, the I/Obus should be used asymmetrically, with the hostprocessor moving data to the network and exploiting DMA to movedata to the host. Finally, buffer management should be extremelysimple in the network coprocessor and match queue structuresbetween the network coprocessor and host memory. Detailedmeasurements show how each of these features contribute to highperformance.