Software overhead in messaging layers: where does the time go?

Authors:
Vijay Karamcheti;Andrew A. Chien
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Avenue, Urbana, IL;Department of Computer Science, University of Illinois at Urbana-Champaign, 1304 W. Springfield Avenue, Urbana, IL
Venue:
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Year:
1994

Citing 16
Cited 30

Technologies for low latency interconnection switches

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Performance of the Firefly RPC

ACM Transactions on Computer Systems (TOCS)
On the design of deadlock-free adaptive routing algorithms for multicomputers: design methodologies

PARLE '91 Proceedings on Parallel architectures and languages Europe : volume I: parallel architectures and algorithms: volume I: parallel architectures and algorithms
User-level interprocess communication for shared memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Chaos router: architecture and performance

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The turn model for adaptive routing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
A tightly-coupled processor-network interface

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A family of routing and communication chips based on the Mosaic

Proceedings of the 1993 symposium on Research on integrated systems
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compressionless routing: a framework for adaptive and fault-tolerant routing

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Virtual-Channel Flow Control

IEEE Transactions on Parallel and Distributed Systems
Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels

IEEE Transactions on Parallel and Distributed Systems
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard

Remote queues: exposing message queues for optimization and atomicity

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
NIFDY: a low overhead, high throughput network interface

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Rotating combined queueing (RCQ): bandwidth and latency guarantees in low-cost, high-performance networks

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evaluating the limits of message passing via the shared attraction memory on CC-COMA machines: experiences with TCGMSG and PVM

ICS '96 Proceedings of the 10th international conference on Supercomputing
PP-MESS-SIM: A Flexible and Extensible Simulator for Evaluating Multicomputer Networks

IEEE Transactions on Parallel and Distributed Systems
Evaluating design alternatives for reliable communication on high-speed networks

ACM SIGPLAN Notices
Evaluating design alternatives for reliable communication on high-speed networks

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A framework for performance-based program partitioning

Progress in computer research
User-space communication: a quantitative study

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A framework for performance-based program partitioning

Progress in computer research
Efficient layering for high speed communication: the MPI over Fast Messages (FM) experience

Cluster Computing
Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions

Cluster Computing
High Performance Network of PC Cluster Maestro

Cluster Computing
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
Impact of Virtual Channels and Adaptive Routing on Application Performance

IEEE Transactions on Parallel and Distributed Systems
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems

IEEE Transactions on Parallel and Distributed Systems
Software Techniques for Improving MPP Bulk-Transfer Performance

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Software Support for Virtual Memory-Mapped Communication

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Enhancing the Performance of Tiled Loop Execution onto Clusters Using Memory Mapped Network Interfaces and Pipelined Schedules

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping

Journal of Parallel and Distributed Computing
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Exploring the Design Space of Self-Regulating Power-Aware On/Off Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Network interfaces for programmable NICs and multicore platforms

Computer Networks: The International Journal of Computer and Telecommunications Networking
rMPI: message passing on multicore processors with on-chip interconnect

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Exploiting programmable network interfaces for parallel query execution in workstation clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Offloading bloom filter operations to network processor for parallel query processing in cluster of workstations

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A study of application-level recovery methods for transient network faults

ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite improvements in network interfaces and software messaging layers, software communication overhead still dominates the hardware routing cost in most systems. In this study, we identify the sources of this overhead by analyzing software costs of typical communication protocols built atop the active messages layer on the CM-5. We show that up to 50–70% of the software messaging costs are a direct consequence of the gap between specific network features such as arbitrary delivery order, finite buffering, and limited fault-handling, and the user communication requirements of in-order delivery, end-to-end flow control, and reliable transmission. However, virtually all of these costs can be eliminated if routing networks provide higher-level services such as in-order delivery, end-to-end flow control, and packet-level fault-tolerance. We conclude that significant cost reductions require changing the constraints on messaging layers: we propose designing networks and network interfaces which simplify or replace software for implementing user communication requirements.