Fragmentation considered harmful
SIGCOMM '87 Proceedings of the ACM workshop on Frontiers in computer communications technology
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Serverless network file systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Reducing network latency using subpages in a global memory environment
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Frangipani: a scalable distributed file system
Proceedings of the sixteenth ACM symposium on Operating systems principles
A Case for NOW (Networks of Workstations)
IEEE Micro
Cut-through delivery in Trapeze: An exercise in low-latency messaging
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Techniques for energy minimization of communication pipelines
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Challenges and opportunities in broadband and wireless communication designs
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Improving the Throughput of Remote Storage Access through Pipelining
GRID '02 Proceedings of the Third International Workshop on Grid Computing
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Streaming Thin Client Compression
DCC '01 Proceedings of the Data Compression Conference
Scheduling divisible workloads on heterogeneous platforms
Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
A Proposal of Pipelined Image Processing in a Grid Environment
SAINT-W '04 Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Scheduling Divisible Loads on Star and Tree Networks: Results and Open Problems
IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Scalable Bulk Data Transfer in Wide Area Networks
International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach
International Journal of High Performance Computing Applications
Design issues in the implementation of MPI2 one sided communication in Ethernet based networks
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Improved Methods for Divisible Load Distribution on k-Dimensional Meshes Using Multi-Installment
IEEE Transactions on Parallel and Distributed Systems
International Journal of High Performance Computing and Networking
Performance Issues of Synchronisation in the MPI-2 One-Sided Communication API
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
FIFO scheduling of divisible loads with return messages under the one-port model
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multimedia Tools and Applications
Hi-index | 0.01 |
In this paper, we study how to minimize the latency of a message through a network that consists of a number of store-and-forward stages. This research is especially relevant for today's low overhead communication systems that employ dedicated processing elements for protocol processing. We develop an abstract pipeline model that reveals a crucial performance tradeoff involving the effects of the overhead of the bottleneck stage and the bandwidth of the remaining stages. We exploit this tradeoff to develop a suite of fragmentation algorithms designed to minimize message latency. We also provide an experimental methodology that enables the construction of customized pipeline algorithms that can adapt to the specific system characteristics and application workloads. By applying this methodology to the Myrinet-GAM system, we have improved its latency by up to 51%. Our theoretical framework is also applicable to pipelined systems beyond the context of high speed networks.