Modeling communication pipeline latency

Authors:
Randolph Y. Wang;Arvind Krishnamurthy;Richard P. Martin;Thomas E. Anderson;David E. Culler
Affiliations:
Computer Science Division, University of California, Berkeley;Computer Science Division, University of California, Berkeley;Computer Science Division, University of California, Berkeley;Department of Computer Science and Engineering, University of Washington, Seattle;Computer Science Division, University of California, Berkeley
Venue:
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Year:
1998

Citing 11
Cited 19

Fragmentation considered harmful

SIGCOMM '87 Proceedings of the ACM workshop on Frontiers in computer communications technology
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Serverless network file systems

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Reducing network latency using subpages in a global memory environment

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Frangipani: a scalable distributed file system

Proceedings of the sixteenth ACM symposium on Operating systems principles
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
A Case for NOW (Networks of Workstations)

IEEE Micro
Cut-through delivery in Trapeze: An exercise in low-latency messaging

HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing

Techniques for energy minimization of communication pipelines

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Challenges and opportunities in broadband and wireless communication designs

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Improving the Throughput of Remote Storage Access through Pipelining

GRID '02 Proceedings of the Third International Workshop on Grid Computing
EPOS and Myrinet: Effective Communication Support for Parallel Applications Running on Clusters of Commodity Workstations

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Streaming Thin Client Compression

DCC '01 Proceedings of the Data Compression Conference
Scheduling divisible workloads on heterogeneous platforms

Parallel Computing - Parallel matrix algorithms and applications (PMAA '02)
A Proposal of Pipelined Image Processing in a Grid Environment

SAINT-W '04 Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops)
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
Scheduling Divisible Loads on Star and Tree Networks: Results and Open Problems

IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Scalable Bulk Data Transfer in Wide Area Networks

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Design issues in the implementation of MPI2 one sided communication in Ethernet based networks

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Improved Methods for Divisible Load Distribution on k-Dimensional Meshes Using Multi-Installment

IEEE Transactions on Parallel and Distributed Systems
Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters

International Journal of High Performance Computing and Networking
Performance Issues of Synchronisation in the MPI-2 One-Sided Communication API

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
FIFO scheduling of divisible loads with return messages under the one-port model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
SeWDReSS: on the design of an application independent, secure, wide-area disaster recovery storage system

Multimedia Tools and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we study how to minimize the latency of a message through a network that consists of a number of store-and-forward stages. This research is especially relevant for today's low overhead communication systems that employ dedicated processing elements for protocol processing. We develop an abstract pipeline model that reveals a crucial performance tradeoff involving the effects of the overhead of the bottleneck stage and the bandwidth of the remaining stages. We exploit this tradeoff to develop a suite of fragmentation algorithms designed to minimize message latency. We also provide an experimental methodology that enables the construction of customized pipeline algorithms that can adapt to the specific system characteristics and application workloads. By applying this methodology to the Myrinet-GAM system, we have improved its latency by up to 51%. Our theoretical framework is also applicable to pipelined systems beyond the context of high speed networks.