LoPC: modeling contention in parallel algorithms

Authors:
Matthew I. Frank;Anant Agarwal;Mary K. Vernon
Affiliations:
Laboratory for Computer Science, Massachusetts Institute of Technology;Laboratory for Computer Science, Massachusetts Institute of Technology;Computer Sciences Department, University of Wisconsin-Madison
Venue:
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1997

Citing 29
Cited 26

The AMVA priority approximation

Performance Evaluation
An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A mean-value performance analysis of a new multiprocessor architecture

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A bridging model for parallel computation

Communications of the ACM
Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment

IEEE Transactions on Computers
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
An atomic model for message-passing

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Contention in shared memory algorithms

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Analyzing the behavior and performance of parallel programs

Analyzing the behavior and performance of parallel programs
LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Predicting application behavior in large scale shared-memory multiprocessors

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
The M-Machine multicomputer

Proceedings of the 28th annual international symposium on Microarchitecture
The network architecture of the connection machine CM-5

Journal of Parallel and Distributed Computing
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Fast Parallel Sorting Under LogP: Experience with the CM-5

IEEE Transactions on Parallel and Distributed Systems
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
The QRQW PRAM: accounting for contention in parallel algorithms

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Distribution of Queuing Network States at Input and Output Instants

Journal of the ACM (JACM)
The MVA priority approximation

ACM Transactions on Computer Systems (TOCS)
How to Get Good Performance from the CM-5 Data Network

Proceedings of the 8th International Symposium on Parallel Processing
START-NG: Delivering Seamless Parallel Computing

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Some Extensions to Multiclass Queueing Network Analysis

Proceedings of the Third International Symposium on Modelling and Performance Evaluation of Computer Systems: Performance of Computer Systems
The MVA Pre-empt resume priority approximation

SIGMETRICS '83 Proceedings of the 1983 ACM SIGMETRICS conference on Measurement and modeling of computer systems
UDM: User Direct Messaging for General-Purpose Multiprocessing

UDM: User Direct Messaging for General-Purpose Multiprocessing
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

LoGPC: modeling network contention in message-passing programs

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive analysis of a wavefront application using LogGP

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scal-Tool: pinpointing and quantifying scalability bottlenecks in DSM multiprocessors

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
AMVA techniques for high service time variability

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
LoGPC: Modeling Network Contention in Message-Passing Programs

IEEE Transactions on Parallel and Distributed Systems
SimpleFit: A Framework for Analyzing Design Trade-Offs in Raw Architectures

IEEE Transactions on Parallel and Distributed Systems
Performance prediction for random write reductions: a case study in modeling shared memory programs

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems

IEEE Transactions on Software Engineering
Quantifying and Resolving Remote Memory Access Contention on Hardware DSM Multiprocessors

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Quantifying contention and balancing memory load on hardware DSM multiprocessors

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Parallel program performance prediction using deterministic task graph analysis

ACM Transactions on Computer Systems (TOCS)
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
Predicting and Evaluating Distributed Communication Performance

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Modeling contention of sparse-matrix-vector multiplication (SMV) in three parallel programming paradigms

WOSP '07 Proceedings of the 6th international workshop on Software and performance
$\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$: Accurate Analytical Models of Point-to-Point Communication in Distributed Systems

IEEE Transactions on Computers
Shared resource access attributes for high-level contention models

Proceedings of the 44th annual Design Automation Conference
Modeling contention of giga-updates per second (GUPs) in three parallel programming paradigms

PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Adaptive approaches for efficient parallel algorithms on cluster-based systems

International Journal of Grid and Utility Computing
WARPP: a toolkit for simulating high-performance parallel scientific codes

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
A knowledge discovery methodology for behavior analysis of large-scale applications on parallel architectures

ICCS'03 Proceedings of the 2003 international conference on Computational science
The LogP and MLogP models for parallel image processing with multi-core microprocessor

Proceedings of the 2010 Symposium on Information and Communication Technology
An analytical model for multilevel performance prediction of Multi-FPGA systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Compiler-Directed performance model construction for parallel programs

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Performance analysis and optimization of MPI collective operations on multi-core clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is inspired by the LogP model but accounts for contention for message processing resources in parallel algorithms on a multiprocessor or network of workstations. LoPC takes the L, o and P parameters directly from the LogP model and uses them to predict the cost of contention, C.This paper defines the LoPC model and derives the general form of the model for parallel applications that communicate via active messages. Model modifications for systems that implement coherent shared memory abstractions are also discussed. We carry out the analysis for two important classes of applications that have irregular communication. In the case of parallel applications with homogeneous all-to-any communication, such as sparse matrix computations, the analysis yields a simple rule of thumb and insight into contention costs. In the case of parallel client-server algorithms, the LoPC analysis provides a simple and accurate calculation of the optimal allocation of nodes between clients and servers. The LoPC estimates for these applications are shown to be accurate when compared against event driven simulation and against a sparse matrix computation on the MIT Alewife multiprocessor.