Analysis of Delays Caused by Local Synchronization

Authors:
Julia Lipman;Quentin F. Stout
Affiliations:
jclipma@super.org;qstout@eecs.umich.edu
Venue:
SIAM Journal on Computing
Year:
2010

Citing 12
Cited 0

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
On the execution of parallel programs on multiprocessor systems—a queuing theory approach

Journal of the ACM (JACM)
Bounds on the speedup and efficiency of partial synchronization in parallel processing systems

Journal of the ACM (JACM)
On the average communication complexity of asynchronous distributed algorithms

Journal of the ACM (JACM)
Reducing synchronization overhead in parallel simulation

PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Eliminating barrier synchronization for compiler-parallelized codes on software DSMs

International Journal of Parallel Programming - Special issue on languages and compilers for parallel computing. Part I
Interference in multiprocessor computer systems with interleaved memory

Communications of the ACM
The structural cause of file size distributions

Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Changes in Web client access patterns: Characteristics and caching implications

World Wide Web
On the Performance of Synchronized Programs in Distributed Networks with Random Processing Times and Transmission Delays

IEEE Transactions on Parallel and Distributed Systems
Stochastic Modeling of Scaled Parallel Programs

Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Strong Stochastic Bounds for the Stationary Distribution of a Class of Multicomponent Performability Models

Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. Nonetheless, barriers are the only form of synchronization explicitly supplied in OpenMP, and they occur whenever collective communication operations are used in MPI. Many applications do not actually require global synchronization; local synchronization, in which a processor synchronizes only with those processors from or to which information or resources are needed, is often adequate. However, when tasks take varying amounts of time, the behavior of a system under local synchronization is more difficult to analyze since processors do not start tasks at the same time. We show that when the synchronization dependencies form a directed cycle and the task times are geometrically distributed with $p=0.5$, then as the number of processors tends to infinity the processors are working $2-\sqrt{2}\approx0.59\%$ of the time. Under global synchronization, however, the time to complete each task is unbounded, increasing logarithmically with the number of processors. Similar results apply for $p\neq0.5$. We also present some of the combinatorial properties of the synchronization problem with geometrically distributed tasks on an undirected cycle. Nondeterministic synchronization is also examined, where processors decide randomly at the beginning of each task which neighbors(s) to synchronize with. We show that the expected number of task dependencies for random synchronization on an undirected cycle is the same as for deterministic synchronization on a directed cycle. Simulations are included to extend the analytic results. They show that more heavy-tailed distributions can actually create fewer delays than less heavy-tailed ones if the number of processors is small for some random-neighbor synchronization models. The results also show the rate of convergence to the steady state for various task distributions and synchronization graphs.