LoGPC: Modeling Network Contention in Message-Passing Programs

Authors:
Csaba Andras Moritz;Matthew I. Frank
Affiliations:
Univ. of Massachusetts, Amherst;Massachusetts Institute of Technology, Cambridge
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2001

Citing 17
Cited 23

Towards an architecture-independent analysis of parallel algorithms

SIAM Journal on Computing
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The impact of communication locality on large-scale multiprocessor performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
LoPC: modeling contention in parallel algorithms

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
Assessing Fast Network Interfaces

IEEE Micro
Limits on Interconnection Network Performance

IEEE Transactions on Parallel and Distributed Systems
On the time complexity of broadcast communication schemes (Preliminary Version)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Exploiting Two-Case Delivery for Fast Protected Messaging

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Sensitivity of Communication Mechanisms to Bandwidth and Latency

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures

IEEE Transactions on Parallel and Distributed Systems
P-3PC: A Point-to-Point Communication Model for Automatic and Optimal Decomposition of Regular Domain Problems

IEEE Transactions on Parallel and Distributed Systems
Incorporating memory layout in the modeling of message passing programs

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Parallel, distributed and network-based processing
Quantification of memory communication

High performance scientific and engineering computing
Opportunities and challenges in application-tuned circuits and architectures based on nanodevices

Proceedings of the 1st conference on Computing frontiers
Toward an analytical solution to task allocation, processor assignment, and performance evaluation of network processors

Journal of Parallel and Distributed Computing
The impact of spatial layout of jobs on I/O hotspots in mesh networks

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Optimal Algorithms for Scheduling Large-Scale Divisible Load on Heterogeneous Systems in Non-blocking Mode of Communication

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
An Accurate Communication Model of a Heterogeneous Cluster Based on a Switch-Enabled Ethernet Network

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
Modeling advanced collective communication algorithms on cell-based systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Assessing contention effects on MPI_alltoall communications

GPC'07 Proceedings of the 2nd international conference on Advances in grid and pervasive computing
Manycore performance analysis using timed configuration graphs

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
LogGOPSim: simulating large-scale applications in the LogGOPS model

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Incorporating memory layout in the modeling of message passing programs

EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Fast barrier synchronization for InfiniBand™

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
LogfP - a model for small messages in InfiniBand

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A contention-aware performance model for HPC-based networks: a case study of the InfiniBand network

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Total exchange performance modelling under network contention

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11], [19] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.