Bounds on the speedup and efficiency of partial synchronization in parallel processing systems

Authors:
C. S. Chang;R. Nelson
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Journal of the ACM (JACM)
Year:
1995

Citing 12
Cited 3

Effects of synchronization barriers on multiprocessor performance

Parallel Computing
High-performance computer architecture

High-performance computer architecture
Reevaluating Amdahl's law

Communications of the ACM
Synchronization, Coherence, and Event Ordering in Multiprocessors

Computer
Approximate Analysis of Fork/Join Synchronization in Parallel Queues

IEEE Transactions on Computers
Performance Analysis of Parallel Processing Systems

IEEE Transactions on Software Engineering
Speedup Versus Efficiency in Parallel Systems

IEEE Transactions on Computers
On the execution of parallel programs on multiprocessor systems—a queuing theory approach

Journal of the ACM (JACM)
A performance evaluation of a general parallel processing model

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Computer Performance Modeling Handbook

Computer Performance Modeling Handbook
Numerical Methods

Numerical Methods
Communication Issues in the Design and Analysis of Parallel Algorithms

IEEE Transactions on Software Engineering

A performance analysis of local synchronization

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Analysis of Delays Caused by Local Synchronization

SIAM Journal on Computing
Scheduling parallel processors: Structural properties and optimal policies

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we derive bounds on the speedup and efficiency of applications that schedule tasks on a set of parallel processors. We assume that the application runs an algorithm that consists of N iterations and before starting its i+1st iteration, a processor must wait for data (i.e., synchronize) calculated in the ith iteration by a subset of the other processors of the system. Processing times and interconnections between iterations are modeled by random variables with possibly deterministic distributions. Scientific applications consisting of iterations of recursive equations are examples of such applications that can be modeled within this formulation. We consider the efficiency of applications and show that, although efficiency decreases with an increase in the number of processors, it has a nonzero limit when the number of processors increases to infinity. We obtain a lower bound for the efficiency by solving an equation that depends on the distribution of task service times and the expected number of tasks needed to be synchronized. We also show that the lower bound is approached if the topology of the processor graph is ldquo;spread-out,” a notion we define in the paper.