Parallel and distributed computation: numerical methods
Parallel and distributed computation: numerical methods
On the execution of parallel programs on multiprocessor systems—a queuing theory approach
Journal of the ACM (JACM)
Bounds on the speedup and efficiency of partial synchronization in parallel processing systems
Journal of the ACM (JACM)
On the average communication complexity of asynchronous distributed algorithms
Journal of the ACM (JACM)
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Eliminating barrier synchronization for compiler-parallelized codes on software DSMs
International Journal of Parallel Programming - Special issue on languages and compilers for parallel computing. Part I
Interference in multiprocessor computer systems with interleaved memory
Communications of the ACM
The structural cause of file size distributions
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
IEEE Transactions on Parallel and Distributed Systems
Stochastic Modeling of Scaled Parallel Programs
Proceedings of the 1994 International Conference on Parallel and Distributed Systems
Hi-index | 0.00 |
Synchronization is often necessary in parallel computing, but it can create delays whenever the receiving processor is idle, waiting for the information to arrive. This is especially true for barrier, or global, synchronization, in which every processor must synchronize with every other processor. Nonetheless, barriers are the only form of synchronization explicitly supplied in OpenMP, and they occur whenever collective communication operations are used in MPI. Many applications do not actually require global synchronization; local synchronization, in which a processor synchronizes only with those processors from or to which information or resources are needed, is often adequate. However, when tasks take varying amounts of time, the behavior of a system under local synchronization is more difficult to analyze since processors do not start tasks at the same time. We show that when the synchronization dependencies form a directed cycle and the task times are geometrically distributed with $p=0.5$, then as the number of processors tends to infinity the processors are working $2-\sqrt{2}\approx0.59\%$ of the time. Under global synchronization, however, the time to complete each task is unbounded, increasing logarithmically with the number of processors. Similar results apply for $p\neq0.5$. We also present some of the combinatorial properties of the synchronization problem with geometrically distributed tasks on an undirected cycle. Nondeterministic synchronization is also examined, where processors decide randomly at the beginning of each task which neighbors(s) to synchronize with. We show that the expected number of task dependencies for random synchronization on an undirected cycle is the same as for deterministic synchronization on a directed cycle. Simulations are included to extend the analytic results. They show that more heavy-tailed distributions can actually create fewer delays than less heavy-tailed ones if the number of processors is small for some random-neighbor synchronization models. The results also show the rate of convergence to the steady state for various task distributions and synchronization graphs.