Effects of communication latency, overhead, and bandwidth in a cluster architecture

Authors:
Richard P. Martin;Amin M. Vahdat;David E. Culler;Thomas E. Anderson
Affiliations:
Computer Science Division, University of California, Berkeley, CA;Computer Science Division, University of California, Berkeley, CA;Computer Science Division, University of California, Berkeley, CA;Computer Science Division, University of California, Berkeley, CA
Venue:
Proceedings of the 24th annual international symposium on Computer architecture
Year:
1997

Citing 36
Cited 83

The Manchester prototype dataflow computer

Communications of the ACM - Special section on computer architecture
An architecture of a dataflow single chip processor

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The DASH prototype: implementation and performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The network architecture of the Connection Machine CM-5 (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Architectural requirements of parallel scientific applications with explicit communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The importance of non-data touching processing overheads in TCP/IP

SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Message passing on the Meiko CS-2

Parallel Computing - Special issue: message passing interfaces
Virtual memory mapped network interface for the SHRIMP multicomputer

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Towards modeling the performance of a fast connected components algorithm on parallel machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Empirical evaluation of the CRAY-T3D: a compiler perspective

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
High-performance sorting on networks of workstations

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Supporting systolic and memory communication in iWarp

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Monsoon: an explicit token-store architecture

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Complete Computer System Simulation: The SimOS Approach

IEEE Parallel & Distributed Technology: Systems & Technology
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
TNet: A Reliable System Area Network

IEEE Micro
A Case for NOW (Networks of Workstations)

IEEE Micro
Memory Channel Network for PCI

IEEE Micro
Assessing Fast Network Interfaces

IEEE Micro
Protocol Verification as a Hardware Design Aid

ICCD '92 Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors
START-NG: Delivering Seamless Parallel Computing

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

LoPC: modeling contention in parallel algorithms

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Modeling communication pipeline latency

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Techniques for energy minimization of communication pipelines

Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Improving I/O performance with a conditional store buffer

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Design challenges of virtual networks: fast, general-purpose communication

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive analysis of a wavefront application using LogGP

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating synchronization on shared address space multiprocessors: methodology and performance

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
NFS sensitivity to high performance networks

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Resource Scaling Effects on MPP Performance: The STAP Benchmark Implications

IEEE Transactions on Parallel and Distributed Systems
Responsiveness without interrupts

ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture

ICS '99 Proceedings of the 13th international conference on Supercomputing
Quality of service for wide area clusters

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Evaluating design alternatives for reliable communication on high-speed networks

ACM SIGPLAN Notices
Parallelizing the Murϕ Verifier

Formal Methods in System Design - Special issue on CAV '97
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
ESP: a language for programmable devices

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Evaluating design alternatives for reliable communication on high-speed networks

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Efficiency vs. portability in cluster-based network servers

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Orthogonal Striping and Mirroring in Distributed RAID for I/O-Centric Cluster Computing

IEEE Transactions on Parallel and Distributed Systems
EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Supporting parallel applications on clusters of workstations: The Virtual Communication Machine-based architecture

Cluster Computing
Hardware-Assisted Characterization of NAS Benchmarks

Cluster Computing
Evolving RPC for active storage

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Virtual Interface Architecture

IEEE Micro
Opportunity Cost Algorithms for Reduction of I/O and Interprocess Communication Overhead in a Computing Cluster

IEEE Transactions on Parallel and Distributed Systems
CPU and incremental memory allocation in dynamic parallelization of SQL Queries

Parallel Computing
Portals 3.0: Protocol Building Blocks for Low Overhead Communication

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
On the Design of Clustering-based Scheduling Algorithms for Realistic Machine Models

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance Prediction Methodology for Parallel Programs with MPI in NOW Environments

IWDC '02 Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing
VIA Communication Performance on a Gigabit Ethernet Cluster

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
On Minimising the Processor Requirements of LogP Schedules

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
QUIC: A Quality of Service Network Interface Layer for Communication in NOWs

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Scalability and accuracy in a large-scale network emulator

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
A high-performance communication service for parallel computing on distributed DSP systems

Parallel Computing
Optimizing Parallel Applications for Wide-Area Clusters

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
Performance Analysis of a Myrinet-Based Cluster

Cluster Computing
Exploiting fast ethernet performance in multiplatform cluster environment

Proceedings of the 2004 ACM symposium on Applied computing
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
A Configurable Network Protocol for Cluster Based Communications using Modular Hardware Primitives on an Intelligent NIC

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PRESS: A Clustered Server Based on User-Level Communication

IEEE Transactions on Parallel and Distributed Systems
A Hardware Acceleration Unit for MPI Queue Processing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Enhancing NIC Performance for MPI using Processing-in-Memory

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Scalability and accuracy in a large-scale network emulator

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Toward an analytical solution to task allocation, processor assignment, and performance evaluation of network processors

Journal of Parallel and Distributed Computing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
Deconstructing Commodity Storage Clusters

Proceedings of the 32nd annual international symposium on Computer Architecture
Making the Most Out of Direct-Access Network Attached Storage

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Modelling asynchronous message passing in small cluster environments

International Journal of Computers and Applications
Temporal search: detecting hidden malware timebombs with virtual machines

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Coprocessor design to support MPI primitives in configurable multiprocessors

Integration, the VLSI Journal
U-Net/SLE: A Java-based user-customizable virtual network interface

Scientific Programming
Productivity prediction of MPI programs based on models

Automation and Remote Control
Performance comparison of LAM/MPI, MPICH, and MVICH on a linux cluster connected by a gigabit ethernet network

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs

IEEE Transactions on Parallel and Distributed Systems
On the Design of Adaptive and Decentralized Load Balancing Algorithms with Load Estimation for Computational Grid Environments

IEEE Transactions on Parallel and Distributed Systems
Broadcasting algorithm of constant complexity for fully-switched clusters

SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Overcoming the processor communication overhead in MPI applications

SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
A distributed Key Message algorithm to optimize the communication in clusters

Parallel Computing
A session key caching and prefetching scheme for secure communication in cluster systems

Journal of Parallel and Distributed Computing
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Motivating future interconnects: a differential measurement analysis of PCI latency

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive-Compi: Enhancing Mpi-Based Applications - Performance and Scalability by using Adaptive Compression

International Journal of High Performance Computing Applications
High-performance message-passing over generic Ethernet hardware with Open-MX

Parallel Computing
Making the most out of direct-access network attached storage

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
FC-TRSN: a new cluster-oriented high-speed communication network

ICCOM'06 Proceedings of the 10th WSEAS international conference on Communications
Using link gradients to predict the impact of network latency on multitier applications

IEEE/ACM Transactions on Networking (TON)
Estimation based load balancing algorithm for data-intensive heterogeneous grid environments

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Measuring MPI send and receive overhead and application availability in high performance network interfaces

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Challenges and issues in benchmarking MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Prediction of communication latency over complex network behaviors on SMP clusters

EPEW'05/WS-FM'05 Proceedings of the 2005 international conference on European Performance Engineering, and Web Services and Formal Methods, international conference on Formal Techniques for Computer Systems and Business Processes
Invited Performance of the communication layers of TCP/IP with the Myrinet gigabit LAN

Computer Communications
Robotic clusters: Multi-robot systems as computer clusters

Robotics and Autonomous Systems
Model based performance evaluation for MPI programs

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 µs. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance.